A minimum spanning tree of a road network

Networks are very common in our highly interconnected world. As an example, consider the network of communication between everyone in the world. It is frequently said that we could reach anyone though six intermediate people (six degrees of separation). Yet, our communication patterns are shaped mostly by the people in our immediate neighborhood.

For the purposes of this post, I'd like to concentrate on one other type of network—the road network in my country. Bulgaria has many cities and a frequent question that arises is which roads between which cities should be built and maintained. Also which roads are subject to a more heavy traffic and which could potentially wear out more easily. Each kilometer road that gets built then needs to be maintained, which can get quite expensive. It is said that a km road in Bulgaria costs on average 2.4 million euro, which can still be a lot cheaper than in other European countries. But this also shows a potentially high variability in the quality of the road. Since we have seen in the past how easy it is for roads to be corrupted after heavy traffic, rain or landslide, it probably makes sense to invest more in the more critical connections of the network. Some prevention could potentially reduce the need to apply patches later.

When building a road network, it is important to consider who will be using it. If a regular citizen could barely live on a 170 euro monthly salary, will they eventually support building roads swallowing millions? What is the relative perceived cost of the investment and when will it eventually repay? How long will Dobri need to work to finance the road in front of his house and will he need to fall deeply into debt? These are all important questions and I have still not heard any reasonable answer to them.

Our government believes that building infrastructure will eventually attract more investments in the country. While I agree that infrastructure could help, I disagree that this should happen at any cost. If most people continue to die at the present rate (see ratio deaths:newborns), the new roads here will soon become our candles of stillness since no car will be moving on them (self-driving cars are allowed to self-charge). There are many other factors like security, justice, ease of starting a business, availability of labor, cultural norms and others that all affect whether someone will invest or not. Improving one area while neglecting all others isn't likely to lead to the outcome we'd like to see. Building roads in order to obtain subsidies from the European Union also does a disservice to everyone.

Not every road in a network should be maintained equally well. For instance, if only few people live in two villages, they probably don't need to have a modern road adhering to the latest construction standards. On the opposite, if many people live in a city, it is likely that this city will need great connections in order to increase the frequency of economical transactions. What we could do then is to print a map and connect the first n cities with the highest population. This would give us an intutive feeling which roads in the country deserve to have the highest priority at a certain time. But we see that the number of connections grows quickly and that would correspond to many kilometers of road in reality. Connecting every city is not only inefficient, but raises the total cost of the network.

We can preserve some degree of connectivity while decreasing this cost. Drawing the minimum spanning tree is one way to achieve it, although it is not without limitations. In the case of a road network, its total length would give us the shortest road length that needs to exist in order to connect all cities. Having the coordinates of the cities allows us to compute the Euclidean distances between them and use them as weights to get the minimum spanning tree. We can find the real distance between two cities A and B (in km) and the distance that we can obtain from their coordinates, so this allows us to compute the scale of the map. We can find the length of the minimum spanning tree as a sum of all edge lengths (distances between each pair of cities) and multiply it by the scale of the map to determine approximately how many kilometers road we need to connect the cities. Since I could only find data about the biggest 249 cities in terms of population size, this tree will not consider villages that have less than 300 people. This is how I came to an MST corresponding to approx. 3133km of real road needed to connect all cities (straight lines). If we consider that the distance Sofia-Varna is said to be approx. 379km on a straight line (and 444km on road), then 3133km to connect all cities doesn't seem unreasonable. However, according to a statistic from 2005, Bulgaria has more than 40231km of road. This means that in our current network we maintain road whose length is the equivalent of connecting all our cities 13 times. No wonder that the money to improve the road infrastructure is never enough and that there's always a need to “prepare” the audience for more expenses. Even when we needed to add more connections to the minimum spanning tree to ensure better connectivity of the cities, they would be relatively short. If the length of the network was the average between the minimum spanning tree and the current maximal length, we would come to save a little more than 20000km of road. Multiplied by the cost per km, this results in approx. 48.2 billion euro savings. For the average citizen this sum would be entirely incomprehensible.

The economic cost of having to drive around when no direct road is available also shouldn't be underestimated, especially when it multiplies to the number of drivers that have to pay it. The cost of not fixing roads can be high as well, when they can break an unacceptably high number of new cars.

We could plot the map of the country and overlay the minimum spanning tree on it to show through which cities the roads should pass. A small map segment shows a couple of big cities and the network in which they are currently connected. If you click on it, you should see the minimum spanning tree of all cities.

One limitation of the minimum spanning tree is that it connects cities on a straight line. In reality, we have obstacles on the map like sea, lakes, mountains, rocks, national parks etc. The tree won't show us how to overcome these obstacles. In some simpler cases we might be able to apply a pathfinding algorithm to move around them, but is is not obvious how this could work on any map with any obstacle. The need to manually define what we consider as obstacles could also be potentially time-consuming. In this case we see that an edge is passing through the Black Sea, which need not be the case, but it is visible that the current infrastructure already efficiently solves this problem.

Another limitation is that in a minimum spanning tree, each road becomes a single point of failure. A defect in one can influence how reachable cities on that path become. Therefore we need to add more connections to minimize the risk of failure and to ensure that drivers will have alternative routes to choose from. Because of such context-specific decisions, inspecting the average case as I did before may still give some orientation. In the south-east region of the map we already see how paths have been added that aren't part of the tree. From forming a cycle selectively here and there, all cities in the neighborhood can benefit, especially when they are economically more important.

This minimum spanning tree shown here also doesn't tell us anything about how the roads within a city should look like. Each municipality probably has its own picture about this. But it is interesting that at a city level the minimum spanning tree can be applied again.

Despite these small defects, we see that the minimum spanning tree connects the cities relatively evenly, so it can be a useful model for the analysis of similar networks. Building a more optimal network means lower maintanence costs and reducing our resource usage.

That a network has been built a long time ago doesn't mean that it is perfect or doesn't need to change. It may be more costly to maintain an inefficient network than to rewire it again. We should always consider the advantages and disadvantages before choosing one of the two. The world is changing too. For instance, many people move from one city to another or even migrate to another country. If the cities are left without their people, then the roads that were previously built will suddenly remain unused. Also, in a case of war, the previously optimal network may have been ruined and the loss of territory would mean that connections between some cities may no longer be optimal. In this case new connections may need to be established.

I hope that this gave you an idea of how you could reduce the resource usage of your company. In a resource-constrained world this skill will be progressively more valued and needed. Over-investments break businesses, introduce corruption, deplete people of the resources they need for a normal life. We need to avoid them and bootstrap when possible.