The case for a 7-hop limit and high-power infrastructure nodes on MeshOregon

There’s been a lot of off and on chatter in the PDXMesh discord about decreasing hop limits on nodes because it’s “not necessary” to keep it up at 7. In addition, there has been some controversy in the local (Willamette Valley) amateur radio community regarding the use of higher transmit power nodes for infrastructure.

In this blog post, I’d like to take a moment to explain why 7 is the best selection for 99% of use cases with MeshOregon, and why higher power nodes help the mesh, not hurt. First, I’ll start with an explanation of where this idea of it being so necessary to limit hop counts came from in the first place.

While they may seem like two separate items, these two subjects are 100% intertwined and key to success on the Willamette Valley MeshOregon network.

This post is 100% organic and AI-free, so if it jumps around a little, I’d like to apologize in advance.

How packets move around a mesh

When Meshtastic was built, the early experimentation was based on a sort of “true mesh” – where there are several assumptions that are usually (but not always) made. One key false assumption that continues to be made by many to this day is that nodes are, except for the role that is chosen on them, generally equal participants/contributors to the mesh. This theory is what I refer to as a “true mesh”.

If you’re not familiar with how Meshtastic packets propagate through a mesh, I highly suggest taking a moment to learn about the managed flood algorithm that Meshtastic uses. Some sources you can use to learn are:

This meshtastic wiki page
This mesh simulator from Alex K2XAP with nyme.sh
This other mesh simulator from NH Meshtastic

Without getting too into the weeds (don’t worry, there will be time for that later), the summary is that:

A non-router node won’t rebroadcast a packet it hears more than once.
A router will always rebroadcast every packet it hears.
A packet’s path through the mesh is, thusly, effectively determined by which nodes rebroadcast its packet first.

If we proceed with the stated success goal of a mesh network being near-100% packet delivery rate to all connected nodes, then, in a perfect world, we wouldn’t have a hop limit. We’d just propagate the packet out to all nodes until it reaches everyone. But, here in the real world, we run into an issue – limited data capacity. In Meshtastic, this is called channel utilization, or chutil. In order for mesh success to remain high, this number must remain low, or you’ll start to get congestion and packet collisions which increase packet loss.

In mesh simulators, you’ll sometimes see a metric called “usefulness” – this is an important metric, and the key driver behind recommendations to keep hop counts low. Usefulness is defined as what percentage of the radio traffic a device hears is new information. Let’s start by picturing a room with 10 people acting like nodes. One person wants to send a message to everyone, by talking. Let’s start by placing them in a simple circle where everyone can hear eachother. And let’s give these people an “algorithm” much simpler than Meshtastic’s – “everyone repeat everything they hear, as soon as they hear it”

A moment later, the room erupts into a cacophony of chaos as the message attempts to propagate using our simplified “algorithm”. Since everyone heard the original node’s message at the same time, everyone repeats it… at the same time. All of these “rebroadcasts” are unintelligible as everyone is talking over each other.

If we wanted to describe the “usefulness” of the data here, we’d say it’s only 10% useful – because out of the 10 times the packet was “transmitted”, only one of those was “necessary” – everyone already got the packet from the initial transmission. But because a mesh network is decentralized, the nodes don’t know if the propagation was complete. They only follow their algorithm – in this case, a very bad one (which is not what Meshtastic uses). Not only that, but the whole “channel” became unusable in this circumstance – nobody could hear each other, as they were all being “stepped on” (talking at the same time) by each other.

So, let’s take this one step further now. Let’s change our algorithm slightly moving forward. “Repeat everything you hear, as soon as you can, without talking over someone else.” – we’ll leave out the technicalities of collision avoidance for now. Let’s send out another important message.

And now, following their algorithms, the others respond, but this time one at a time. This takes a little while, because they each have to repeat it.

Okay, so we fixed one issue. We can hear everyone now. But, we still have major issues with our mesh algorithm. Assuming it takes each person 3 seconds to speak the message, that means that this packet’s propagation took 30 seconds, during which time the channel was fully utilized. Further, the usefulness remains unchanged at only 10%, because none of the others actually had to repeat this message. While this “algorithm” ensures complete delivery, it is the least efficient way to do so. 27 of the 30 seconds were wasted.

Okay, that’s enough playing with terrible algorithms. Let’s take a simple version of Meshtastic’s algorithm. One key thing I didn’t mention before that is going to start to come into play is how a node decides how long to wait before transmitting. The simplified summary for the purposes of continuing with this example is “the furthest away node will transmit first”.

Ok, so here’s our new algorithm. “Rebroadcast any packet you hear only once. Don’t talk over anyone else. The person who is the furthest from the sender goes first.” And, here we go.

Great, everyone heard the message. And through Magic(tm), the furthest away person knows that they’re the furthest, and goes first. Aren’t simulated perfect worlds so great and easy to conceptualize?

This node retransmits the message. This is pointless, because everyone already heard it – but it has no way of knowing that, it only follows its algorithm. Because every other node has now heard the message twice, and none of them have special jobs (router roles), they’ve all now been silenced. This change brought our usefulness up from 10% to 50%, and we now only tied up the channel for 6 seconds instead of 30. Our delivery rate remains at 100%. Much better!

The only way to make this more efficient in this scenario would be to only speak once, since we know everyone can hear. But we don’t actually KNOW that everyone can hear – and in a real world scenario we’re not all standing around in a tight circle to play with mesh stuff. So, let’s complicate things a little and get another step closer to real life.

In this next scenario, we’re spread out around the room, and we’re going to talk quieter. This decreases our range and makes meshing, well, actually necessary.

Only two nodes heard the original message this time. They’re about the same distance apart, but with Magic(tm) they don’t talk over each other. Let’s play this out.

Ok, this got a lot more complicated kind of quickly, so let’s go over it.

Given the decreased range, several “hops” became necessary. The initial transmission hit only two nodes, which could not hear each other. Given that, they both retransmitted. This then proceeded down both chains until the packet reached 100% of the mesh. But for the first time, we have hop counts.

In this example, transmission is completed in about 15 seconds, though the “useful” transmission completes after the 3rd hop is completed on the left chain. Overall, 7 transmissions occurred in this example. Of those, 6 were necessary for the data to propagate throughout the mesh. Two nodes which heard the packet were inhibited from retransmitting, because they heard it twice. Note also that because not every node was in range of the others, multiple retransmissions were able to occur simultaneously in different areas without “talking over eachother”. In this scenario, the only way to add efficiency with our given layout and range would have been to limit the number of hops to 3. This decreases the overall radio use time by 3 seconds and eliminates an unnecessary retransmit from the node at the upper left. Great. On paper, we made an improvement. But, rather than me drawing more and more complex layouts, I’m going to ask you to conceptualize this next part. What if, instead, we limited the hops to 2?

Our transmission is now finished after only 2 hops, ending after 9 seconds. Counting the sender, 70% of the mesh has the message. But 30% of the mesh did not receive it, because the final node in the left chain was inhibited from rebroadcasting due to the fact that the message arrived with its hop limit exhausted. While this is an awfully small limit and an awfully small example mesh, making it seem like a ridiculous limit we artificially imposed, it’s important to note that these types of pragmatic decisions in the mesh algorithm are necessary in most circumstances, especially in a conceptual “true mesh”. In the real world, it’s not just one person talking. Everyone has something to say, and everyone wants their message to get out. And not everything they have to say is even important. “Hi, I’m Joe” your node says for the 10th time today, and despite everyone in the room already knowing your name everyone in the room repeats “That’s Joe.”

Tangent: Routers

Alright, just a side note that I’m gonna jam in here awkwardly. Meshtastic has roles, which modify the algorithm’s behavior in certain nodes. One of those roles is router. Essentially what you need to know here is, a router “cuts in line” to retransmit before everyone else, and it ALWAYS* retransmits, no matter how many times it’s heard a packet. They should generally be used in sites that have a significantly better connectivity to the mesh than most nodes, like on a tower or tall building where their range is exceptional for the given mesh. In our example, we’ll depict a router in red, and a node in an “exceptional position” (with good range) as having a megaphone. When the two factors combine, we have the potential to greatly increase deliverability and efficiency/usefulness using fewer hops. Neat!

When improperly deployed, their self-prioritization “eats hops” and significantly increases chutil without a purpose. This causes packets to take routes that may be bad even when better ones exist, due to the rules about preemption and hearing a packet twice. Look at how inefficient this route is, and it didn’t even hit the whole mesh!

These sub optimal deployments and “bad routes” (which also sometimes just happen by chance) are another reason hop limits became so necessary. From a network management perspective, in circumstances like this, even if one packet has a bad time, the network as a whole has a better chance if hops and chutil are properly managed.

(Just a reminder: This post is actually going to argue for a HIGHER hop limit on MeshOregon… I know it doesn’t seem like that yet… but it wouldn’t be right to skip the backstory!)

Back on topic…

The meshes just kept growing and growing – not just in node count, but in physical size. Some meshes began to span hundreds of miles and connect hundreds of nodes – far more than was originally anticipated. Meshes began to collapse under their own weight as packets flooded around from all directions bringing chutil to new high levels. More and more, actions had to be taken to ensure the viability of the mesh in passing packets reliably. (Yes, I am continuing to oversimplify – we’ll get there)

Many people started to feel frustrated as their clients could see hundreds of nodes across their region, but they could hardly get a message to cross their neighborhood reliably. But the recommendations began to become louder and louder. “Be a responsible member of your mesh. Reduce your hop limit. Reduce your traffic. Decrease chutil and make the mesh better for everyone” – or, put another way – “Your mesh is too big and has too much to say. Shrink it, and talk less.”

A competitor appears

I’m going to skip over a lot of the drama on this. A competing protocol, MeshCore, appeared. They made a lot of design decisions which appear to have been informed by real world data gathered from Meshtastic meshes. With regard to the discussion we’re having here, the most important changes are:

Not all nodes will retransmit. Only designated “repeaters”, intended to be mounted on roofs/high places retransmit.
- This is notable as it is an attempt to tackle the difference between a theoretical mesh (where all nodes are equal) and a real mesh (where they very much aren’t). You don’t want your low power node in your pocket to retransmit when there’s a roof node 100′ away that will get out much further.
- Yes, I am aware that there are Meshtastic roles which accomplish the same, and use of them was also encouraged. We’re mostly avoiding the nuanced role discussion at this point though.
The hop limit has been increased dramatically – to 64.
- Clearly, maximum propagation is the priority – and understandably so.
In order to compensate for the above, unnecessary message broadcasts (telemetry, etc) have been dramatically reduced or eliminated, leaving as much airtime as possible for text messages.

People immediately saw increased success with MeshCore. At least here in the PNW, the frustration with Meshtastic on LongFast had become extreme. MeshCore has had especially impressive success in the ham radio community locally, which has resulted in an impressive mesh that can sometimes pass messages across incredible distances from Eugene up to Seattle and beyond. This is impressive – not just from a user perspective, but from a network management perspective as well.

Meshtastic continued to improve

It was no secret at this point what made meshes work and not work. Meshcore had an advantage in that they had all of this gathered real-world data, whereas Meshtastic was working with an older codebase and maintaining backwards compatibility. This put both in fundamentally different positions with regard to how they could continue to iterate and improve their systems. That being said, Meshtastic made real changes which, for nodes that are properly configured and up to date, make a HUGE improvement in mesh performance. I would go as far as to say, at this point, the two protocols are in very similar positions with regard to functionality.

Warning: Opinions ahead

I want to be clear that we’re transitioning now from the part of the post where I give you the backstory to the part where I start to integrate more of my opinions and conclusions. I don’t want to present myself as some sort of authority in the matter, but I don’t say what I say without a good reason. My day job is an IT consultant / systems administrator. I design, install, evaluate, and repair complex systems. I think this, plus my experience in the MeshOregon build out and the opinions I gathered from other experienced individuals across the country, puts me in a relatively okay position to make this post. But I am not a Meshtastic developer. Nor am I claiming the common community advice is “incorrect.” Anyway, here we go…

A “true mesh” doesn’t work at scale with the technology available. A hierarchy is essential to the proper function of a mesh.

A Hierarchical Mesh

As testing and research continued for the MeshOregon project, after speaking with groups like Baymesh down in California, we found some consistent information. A major key to success is a backbone if infrastructure nodes spanning the region to be covered. As we prioritized these types of nodes, we began to find that, generally, nodes fall into one of three categories.

Every hop a packet has to take to reach its destination decreases its odds of success, as well as raising chutil and making things more difficult for everyone else. A mesh should be built to allow packets to traverse maximum distance and reach maximum nodes in as few hops as possible.

While early mesh networks were built primarily with the lower two layers in mind, it turned out that infrastructure was the key to making everything work. Infrastructure nodes have a few key targeted goals:

Cover the maximum possible geographic area.
- This means using high end equipment and placing the nodes in high elevation locations to squeeze out every little bit of coverage you can get from the site.
Span large distances to connect directly with far-away infrastructure.
- The further away you can make direct infra-infra connections, the fewer hops (and less overall radio time) you need to carry the same traffic.
Provide maximum possible direct packet penetration into the lower mesh layers.
- This is just another way to reduce necessary hops and radio time. If your infra node can get directly into people’s pocket nodes, that’s fewer hops that are needed for the network as a whole. More on that coming up…
Provide geographic diversity in coverage.
- Since we’re not all living on a two-dimensional plane, it’s important that packets come at pocket nodes from multiple angles. Infra should be placed strategically to send packets “down” from multiple angles/directions to increase success.

With regard to our goals of a good mesh: Maximum coverage, Minimum packet loss, and maximum capacity: Well-built infrastructure is the most essential component.

Let’s take a look at an example of a packet moving through a hierarchical mesh.

In this example, the packet originates from the car on the right side. It hits a nearby roof (middle layer) node to help it “elevate” its low power signal from the mobile/handheld layer up to the top “infrastructure” layer. That node connects to two nearby infrastructure sites – a tall building, and a tower. From there, the packet is in the infrastructure network, and is easily distributed directly to all of the clients in the area without requiring excessive rebroadcasts or hops – keeping overall chutil low. Now let’s take a look at an example of the same attempt, minus infrastructure. Remember, every hop decreases our odds of success (and potentially “usefulness”), plus increases overall chutil.

This is of course a bit of an extreme example, but that doesn’t make it unlikely – it more reflects my unwillingness to spend multiple days creating increasingly large and complex diagrams. Regardless, you can see that the packet had to make many hops in this scenario in order to cross all the way across the town.

Why higher transmit power infrastructure helps

In a poorly built mesh network, such as the LongFast defaults in our area, packet paths are unreliable and inconsistent. Because of the lack of infrastructure, it may take many hops to cross even a small distance as your packet takes roundabout paths due to both node placement and other issues like congestion/collisions, interference, and the randomness of the mesh algorithm.

With infrastructure in the right places, not only does higher (but still legal) transmit power help us expand our max usable range (10-20 miles vs 40+), but it also significantly expands our penetration into lower layers of the mesh, directly into partially obstructed nodes on short roofs, in cars, or in pockets. This is a simple truth which means that in an ideal situation, packets delivered from the infrastructure network hit the majority of the mesh without requiring retransmission. In addition, if a node is in a well-covered area and able to receive from multiple infrastructure sources, non-infrastructure retransmissions of infrastructure-delivered packets are effectively suppressed across a large portion of the mesh, drastically reducing the number of retransmissions that are necessary (to achieve deliverability) and permitted by algorithm (due to router preemption and the prevention of rebroadcasting a packet that is heard more than once by most roles). Even if a handheld node is only covered by one infrastructure node, this more reliable penetration straight into its layer means that it is more likely to hear the message, and it (or another nearby node) will “automatically adapt” to the lack of infrastructure by allowing “natural” retransmissions to take place where needed.

Additionally, from a practicality and cost perspective, MeshOregon isn’t going to have a node on every building and cell tower. Tower sites and other very beneficial locations are VERY RARE – and good equipment is expensive. There’s no reason to NOT take full advantage of the ones we have access to.

We’ve also found that higher power can turn “iffy links” – especially in bad weather conditions or with partial obstructions – into reliable ones.

Let’s take this simplified graphic as an example.

At lower power, let’s say this link works on a good/sunny day. Let’s say we start out with a signal of “2”, lose 1 from the weather, and 1 from the trees. That’s none left by the time it reaches the other end. Now, let’s see what happens if we have the same attenuation, but we start with a higher signal of “5”.

In this example, the signal arrives weaker, but still usable, because we had extra margin to spare. The connection remains functional under more conditions, without requiring us to get perfect placement or place infrastructure closer together.

Why higher power doesn’t hurt

We’ve completed several rounds of testing using various nodes in various circumstances. We’ve been unable to find any evidence or any desensitization issues on our tested hardware which directly relate to the proximity of a nearby node operating on the same modem settings when used at any reasonable distance (more than a few feet away). While we don’t deny that deafness issues do occur in Meshtastic nodes as of the current firmware, it doesn’t seem to be connected to nodes on your own setting. Instead, our testing seems to show deafness as most likely originating from a radio locking onto what it thinks is a preamble on a nearby frequency and entering a “false” receive state that it can get stuck on until its radio transmits/resets. tl;dr: Our testing shows higher power only helps, it doesn’t hurt. It provides more “bang for the buck” and betters mesh connectivity and reliability significantly, making previously difficult or impossible links an everyday occurrence without significantly increasing cost or difficulty. This has been backed up by conversations with other mesh coordinators around the country. Higher power is one of the most important tools in a mesh coordinator’s toolbox for improving success.

A note on “one way links”

Others have mentioned that high power nodes create one way links, which, in their opinion, should be avoided. I have also heard it stated that deploying any high power nodes basically makes it so that everybody needs high power. In my personal opinion, especially given the “no-harm” testing above, these arguments don’t make any sense, unless, for some reason, your design principals require some form of node equality – which ours do not. First, if you accept that a hierarchical mesh is the ideal way to build a mesh for maximum deliverability and efficiency, then that comes with the assumption that infrastructure should be as well connected to the mesh as possible, which higher power helps accomplish. Secondly, if packet deliveries without loss is the goal, then having the majority of the entire mesh’s packets reach a node is a success. Third, because of the way a mesh works, it doesn’t require everyone to get high powered nodes in order to participate. It only takes one in an area, or potentially one extra hop across a node that’s a little closer to the main mesh/infrastructure, and you’re back in the network. If you’re the only one on the preset/frequency in your area, a higher transmit power node remains the best choice – but by no means is it required. However, as previously stated, if you’re in an area that was previously “no communication”, which then becomes 1-way, your task of connecting to the mesh just got a whole lot easier. “All you have to do is get your packet to the nearest infrastructure. We’ll take it from there.”

Let’s take a look at a few simple depictions of what I’m talking about here.

Above, you can see a simple depiction of two isolated meshes, which have nodes that have line of sight to each other, but there isn’t a sufficiently strong signal to break through the noise across that distance. Now, let’s give one side some higher power infrastructure.

Now we have packets from the right mesh successfully being delivered into the left mesh, but no path back for packets to travel left to right. While this is a far from ideal situation, note that we have made a significant step towards connecting the previously completely isolated mesh, we’re making better use of our limited sites, and are only one step away from a full connection.

One person chose to add a higher power node to their house, and now both meshes are completely connected thanks to the two higher power nodes, even though there are several low power nodes remaining on each side. In this example, this may have also saved us 1-2 unnecessary hops and node installations in the space between the two areas.

Additionally, it’s worth noting: Our receive isn’t standard, either. We are making use of radios with amplified receive, high gain antennas, and cavity filters to help ensure we can pick out very weak packets from distant and/or low power nodes. We’ll have more posts soon, hopefully, about the hardware design principals for our infrastructure. But, for now, know that we do everything we can to squeeze performance out of infra nodes.

Back to the hops

Ok, so with all of that being said, you may be thinking “Ok, great. But everything I can find online says reduce hops to make a large mesh work better. You say increase hops. Why???”

Quite simply, there are two main reasons. First, we are using alternate methods to ensure success.

We ask users to run the latest Meshtastic firmware and properly tune their node’s settings.
- By nature of being a non-default preset, some work/research is required to connect to MeshOregon. This limits our audience to a more interested group of users, who are more likely to responsibly manage their nodes in other ways (like keeping telemetry disabled and limiting broadcast traffic, similar to MeshCore). With fewer non-essential packets to pass, the odds of success go up. Additionally, we’ve “abandoned” all of the ancient LongFast nodes which pollute the mesh on the default settings and ruin the experience.
We provide infrastructure which is able to reach a majority of the coverage are directly.
- This effectively means that unnecessary rebroadcasts are largely suppressed by nature of the network’s topology. The unnecessary chatter that normally occurs in a poorly built mesh just doesn’t happen when you have good infrastructure getting the packets as close to their destinations as possible.
We run a preset with more data capacity.
- Packets transmit faster, and the odds of collisions are significantly reduced.

Additionally, more hops betters the user experience. There’s few things as frustrating as a new user to join a mesh, see hundreds of nodes, try to traceroute/DM, and nothing goes through. Over-restrictive hop limits artificially and unnecessarily decrease the success rate of communication, making networks which otherwise appear connected act intermittently or completely non-functional on some paths.

But what about the new firmware where infra-infra links don’t eat hops? Do we still really need 7 hops? What about the fact that most messages from my area seem to arrive with only a couple hops used? Doesn’t that mean I’m fine at 3?

Really, I promise. The MeshOregon Willamette Valley network is fine at 7. We’ve done lots of testing. We’ll let you know if anything changes. We have, for our application, largely addressed the underlying reasons hops need to be restricted to low values. The mesh algorithm still exists, and a non-router still won’t rebroadcast a packet it hears more than once. If we’ve done our job properly, the infrastructure will naturally suppress unnecessary rebroadcasts and render the need to limit hops unnecessary.

With everyone on 7 hops, we will have a more consistent and enjoyable experience for all users, including you. This isn’t LongFast. And modern firmware really is much better. There’s a reason 7 hops is our default and recommended setting. There’s no reason to feel guilty about leaving it there. If you really want to help us keep the network running great for everyone, please follow the settings guide at https://MeshOregon.com/settings and double check that your node is behaving properly using our dashboards. We’d rather get 20 7-hop packets from you in a day than 200 unnecessary 3-hop packets.

Limiting your hop count leads to avoidable failures which lead to user frustration and packet loss. On MeshOregon, please leave your hop count at 7. Chances are, it won’t need it – but if it does, those on the edges of the network, or in the area of an infrastructure failure, will thank you.

Footnotes

There are many roles in Meshtastic with nuanced features, as well as pros and cons depending on application. The discussion here is very simplified and should not replace detailed research about which role is best for your circumstances. If you’re debating in MeshOregon though, feel free to ask in your local discord and we’ll chime in and help.

Even “always rebroadcast” roles like routers and router_lates can sometimes drop or miss packets.

Never exceed FCC maximum transmit power limits on your nodes. MeshOregon will not provide legal advice.

The actual algorithm for determining who retransmits first involves role, received RSSI, busy channel detection, modem settings, and an element of randomness. That would take too long to explain here though and is not relevant to the discussion at hand.

Deafness issues with Meshtastic nodes are real, but they seem more vulnerable to interference from strong signals bleeding over from other LoRa/Mesh systems operating on nearby frequencies causing the “false lock-on” vs the other potential issues mentioned by community members. This is an active area of research among Meshtastic developers though and there are potential fixes being tested in the code to periodically “reset” the radio and help prevent this issue. Not that that is especially relevant to this discussion – but it may provide some context for further research for those interested in “deafness” issues.

There is one potential circumstance when you may want to reduce your node’s hop count that I can think of – if the node is solely for telemetry monitoring for you, and you are close to it. For example, if you’re monitoring your solar system in your yard with a voltage monitor board on a node, you may only need 1-2 hops for your telemetry to make it to you, and it’s not relevant to the rest of the mesh – so that’s fine to limit.

It’s also worth nothing that MeshOregon WV is currently testing a new system called Store and Forward ++ which means that if you send a message out with a lower hop limit, our system will detect if it doesn’t reach all major areas of the mesh and silently resync the message in the background, then retransmit it in the missing areas. This means you’ll end up actually using slightly more chutil per message anyway if you reduce your hop limit, as our system then has to “pick up the slack” for you. This is still an early test though and subject to change.