Automatic Georeferencing of Heterogeneous Historic and Illustrated Maps
Keywords: Georeferencing, Illustrated Maps, Geocoding, Historic maps, Automatic georeference
Abstract. The process of manually georeferencing or aligning historic or illustrated maps with contemporary maps can be a difficult and time consuming task (Fleet et al., 2012). It is generally accepted that the level of understanding necessary to correctly georeference a single image can be rather daunting (Bajcsy and Alumbaugh, 2003). This is especially challenging in an open environment where there is no previous information to help approximating the real coordinates.
Over the last couple of decades there have been advances in the automatic georeferencing of map images, aerial photographs or raster maps (Chen et al., 2004), (Desai et al., 2005), (Kim et al., 2010), (Cléry et al., 2014). However, there has been little discussion dealing with heterogeneous maps. For instance, some algorithms apply fixed image processing techniques to find features within the map images, and then try to match these patterns of features to a database of geographical information (Chen et al., 2004). The drawback with this approach is that the image processing operations used in a particular style may not work for a map created using a different style. Other techniques only work for a specific kind of map, like street maps (Desai et al., 2005) or aerial photographs (Kim et al., 2010). Furthermore, the artistic vision of the creator or the theme of the map can also result in these features being represented in different ways (Fiori, 2005). For instance, some styles or themes may highlight some roads or completely ignore others. Finally, historic but inaccurate cartography or contemporary illustrated maps can suffer from distortion or unusual perspective (Cajthaml, 2011).
In this paper, we present a novel algorithm to automatically help start the georeferencing of historic and illustrated maps based on the text found in the map image. To accomplish this, we leverage the power of modern OCR (Optical Character Recognition) and geocoding services on the cloud. The proposed algorithm is able to calculate the area covered by the map, and where north is located in the image, with a precision greater than 80%. This information obtained represents a great help to inexpert users performing the alignment and georeference of maps for the first time. We also propose an optional machine learning module to speed up the process in dynamic environments in which the time required to obtain a result is an important factor. Figure 1 shows some examples of heterogeneous maps processed with the proposed algorithm.
The proposed algorithm contains five modules as shown in Figure 2. The first module applies an OCR process to extract the text contained within the input image. The results pass through a processing step to filter the text using heuristics to remove incorrect and ambiguous entries. The next module (optional) is a bidirectional LSTM (Long shortterm memory) recurrent neural network (Graves and Schmidhuber, 2005) that takes text and orders it according to likelihood of useful geocoding result return. The third module takes the text (ordered or not) and searches for each line in a geocoding service. The output is a list of locations, each one with its real world latitude and longitude and its coordinates within the image. The fourth module calculates a matrix of distances between locations. Each distance contains the real life geodesic distance (Karney, 2013) in meters, the Euclidean distance between each piece of text in pixels, the calculated meters per pixel (MPP), and the rotation. We define rotation as the difference in angle between real life location and the text in the image. Using the MPP and rotation as dimensions, the module finds clusters of corresponding locations. Lastly, the largest cluster is selected as the best. The fifth and final module uses the best cluster of locations and calculates the georeference information. This output information contains the northeast and southwest corners of the map, a list of mapping points, as well as the angle of north in the image (counter-clockwise, where 0 degrees is pointing up).
The proposed algorithm has approximately twelve hyper-parameters that can be tuned. We found that one of the most important is the minimum size of the cluster used to calculate the georeference information. In other words, the minimum number of corresponding locations the algorithm needs to converge.
In Table 1 we show the results of executing the algorithm against a set of 359 illustrated maps obtained from Stroly’s database (Vermeulen et al., 2011). The maps were manually georeferenced, and this information is used as ground truth. The georeference information returned by the algorithm is considered correct when two conditions are met. First, the width of the calculated area is between 50% and 200% of the width of the real area. Second, there is an intersection between both areas. Figure 3 shows the visualization of some results, executing the algorithm against several kinds of maps. The map area delimited in blue is the ground truth, while the one in orange is the one calculated by the presented algorithm. The markers in blue are the locations that are part of the cluster used to calculate the information.
In conclusion, we offer a novel solution to start and in some cases to complete the georeferencing process for heterogeneous historic and illustrated maps based on the text contained within them. The algorithm does not need vector information or geographical databases, nor image preprocessing. We have proven that even with a small cluster of locations the precision of this method is greater than 80%. The precision increases when the hyper-parameter is set to need larger clusters to converge (98.86% for a minimum of six locations). In future iterations we aim to improve the algorithm to increase the precision for smaller clusters and to improve the recall in general.