On Saturday we held an intensive but successful event that focussed on the development challenges for CycleStreets. It was attended by computer scientists, software engineering professionals and webmasters who have experience in managing online services. This summarises some of the stuff we talked about.
Processing OpenStreetMap (OSM) data to build a Graph for Cycle Routing
We began by focussing on how CycleStreets processes OSM data into edges and vertices that can be used by a routing engine.
CycleStreets can find the shortest, fastest or quietest routes according to weightings attached to each edge. For the shortest routes the weight of edges is simply the length of the street between the two vertices. The weighting of the edges for the fastest and quietest routes is more interesting and depends on the type of street, the riding surface, whether there are any traffic lights and if it is going up or down a hill, and other factors going forward.
CycleStreets has its own routing engine that is written in Object-Orientated PHP, this language being a choice purely for legacy/historical reasons that has never been formally reviewed. The space inefficiency of PHP arrays has meant that the CycleStreets routing engine has evolved a suite of techniques for reducing the amount of data exchanged between it and the database.
To make the processing as fast as possible for serving routes live, all the weightings of edges are pre-computed. A data compression technique known as ‘Cello’ (Cellular Optimisation) reduces the number of vertices by a factor of five and the edges by two.
When planning a journey CycleStreets assumes that it needs to look at an area of the map that is up to twice as long as the straight-line distance between the start and finish points. The edges and vertices for that area are extracted from the database and either a Djikstra type or A-star algorithm is applied to produce a route.
Route planning in Central London is the most demanding for the system. An 8-mile journey extacts 22000 vertices and takes 22 seconds to create as a cache in PHP, consuming an unacceptably large amount of memory. It takes a further 2 seconds to search that cache to find a route.
The feedback from this section of the day was that:
- There are off-the-shelf programs that can process the whole of the UK in less than a second. Using one of these implementations, we could replace the small section of code that does the actual graph traversal, to get a quick speedup.
- The physics of the routing is of the greatest interest to cyclists. That is what makes CycleStreets special, and is where we add value; it is the focus of our innovation.
- We need to make it clearer how CycleStreets processes the OpenStreetMap data for cycle routing.
Since the dev day I have had a further look at options for the routing.
Our existing routing engine
At the moment our routing engine exists in two forms: stable and testing.
In the stable form the graph is simply edges and vertices. It has been reliably producing routes for several months, and although in most areas of Britain it can generate routes quickly, it does not work quickly enough in London.
The testing form takes into account the cost of turning at a node and is designed to reduce wiggliness. This has been in development for the last 3 months and has proved complex and difficult to optimise. It has reached the stage where it is nearly ready for release, but has the problem that routing with it takes too much memory.
The key to reducing the amount of data is in reducing the number of edges. In Cello it is quite easy to reduce the number of vertices, but much harder to do that for the edges.
On the developer day we discussed instead how reducing wiggliness could be taken into account by modifying the vertices on the graph, as shown below.
The cost of making the turn from S towards W,N or E shown on the left. can be incorporated into the graph as shown on the right.
The left of the image shows weighted edges joined in the middle with the costs of making the turn from node S towards each of the nodes W,N and E.The total costs of the routes are: SW = 5 + 4 + 4, SN = 5 + 1 + 5 and SE = 5 + 8 + 7.
That can be transformed into the image on the right, in which the turn costs have become extra edges between the vertices. The extra edges are one-way.
Making such a change to our graphs seems relatively simple compared to debugging Cello. At a vertex an extra edge will need to be added for each of the other edges. In this case 4 x 3 = 12 extra vertices and edges.
This modified graph could then be handled by the off-the-shelf routing engine.
Import of Data from OSM
On the developer day we did discuss how CycleStreets imports data. This summary attempts to clarify the process.
The table of how OSM tags are used by CycleStreets is shown at:
This table is orderd by the numbers of matching ways. The first row shows the most common tag pattern that matches half-a-million ways. The table has a long tail.
The table is built by a procedure involving the following steps:
The OSM way tags that are relevant to cycling are identified as the following:
For each OSM ‘way’ a row is created in the map_way_tags database table.
The above tag names are used for the field names in the rows, but suffixed with ‘Tag’, e.g. onewayTag.
The relation tag is only used where the relation it refers to is something to do with cycling.
Usually these are signed cycle routes with tags such as type=”route”, route=”bicycle” and network=”NCN”, ref=”11″.
Some ways are tagged directly with e.g. ncn_ref=”11″
Node tags such as crossings and traffic_signals are handled by later processing.
The values of the tags are checked against an allowed set of values that CycleStreets recognizes.
Where there are invalid values some repairs can be made. For instance from oneway=true to oneway=yes.
The full set of repairs is listed at:
The canonized values are copied into the similarly named fields in the map_way_tags table.
The canonized field names do not include the ‘Tag’ suffix.
So for instance a way with the following tags:
highway = “residential”
oneway = “true”
becomes a row in the map_way_tags table with these original and canonized fields:
highwayTag = residential
onewayTag = true
highway = residential
oneway = yes
The howTagsUsed table lists the contents of the map_way_tags table, grouped by the canonized fields.
Hence there is one row in that table for each unique combination of the canonized values.
3. Initial impressions
The simple mapping between the OSM highway tag and CycleStreets’ provision types defined at:
is used to create an initial suggestion for each row in the howTagsUsed table.
A set of rules is then applied to the permutations to improve the quality of the translations.
For instance a highway=”service” is intially interpreted as provision type “Service Road”.
A rule that matches bicycle=”no” converts this to “Prohibited”.
Inspecting the Translation
The /howTagsUsed/ view provides a detailed and rather complex listing of the translation.
It is possible to click on the items in that table to view the data in filtered ways.
Another way to view the information is via the map on the Journey Planner page.
When signed in with your developer account (which should also include the ‘osm’ privilege) you will have access to the ‘Ways for OSM users’ overlay. This is available via the blue plus symbol at the top right of the map.
I think we could do more to make it a lot clearer how the information has been imported and processed.
One major problem is that the import data cycle has stalled recently due to the on-going work on reducing wiggliness.
This is bad because it means the CycleStreets system is out of date, and any recently added photos will not be included on routes.
There was a long discussion about this, with a range of different views. As ‘open source people’ we are keen to Open Source the codebase, but there are a few factors which govern the timing of this, and some work to be done first.
It won’t bring in new developers straight away, but it will be an on-ramp for getting people more involved in the project.
It will improve the quality of the download and installation.
There is a danger that it could be always something we hope to do, but never actually get round to. We definitely want to avoid that.
We were given a useful overview of how the OpenCycleMap.org project works by its creator, Andy.
This site renders cycle-specific OSM data for the whole globe. It has very beautiful rendering of hills, and in fact that accounts for 96% of the data, the remaining 4% is all the other streets, including the tiny amount that have cycling data. That rendering includes all the known, verifiable cycling routes.
The success of this project keeps their servers extremely busy, but they are soon going to expand to bigger and more powerful systems.
What we did not discuss on the developer day
Time did not permit us to get onto a whole series of other things. Here’s a short list:
- Drag drop routing – this is a feature which already exists in Developer Mode but has not been exposed to users generally; it would need some discussion about permalink URLs.
- The layout of the CycleStreets pages – e.g. increasing the size of the map – which is something we’ve always had a lot of feedback about. We now have a grant which could include some design work commissioning.
- Output from CycleStreets – either as printed output or via export to GPX / KML.
Perhaps when some of the issues we’ve learned from this developer day have settled down we shall think about organising a users day to focus on some of these important, but hopefully simpler issues.