top of page

Description of the methods

Data extraction

We have manually extracted the data from the map in a way that reduces the map’s disorder level. In order to do that, we have arbitrarily selected cities based on their relative importance either as suppliers, consumers or transport hubs. Then, working in a pair, one measured the size of the circle representing a city’s coal consumption, and the size of the square representing the production. The measure is then reported on the map’s scale to be converted into a number of tons. That number is declared aloud and entered into a .csv file called cities.csv by the other data entry worker.

In a second step, in order to measure the flux of coal being transported from the city, one worker selected a flux of coal leaving the selected city for another city, and declared aloud the destination, quantity of coal leaving the city, and the type of coal being transported (the 10 types of coal present on the map as color-code having been assigned a simple number-code). This information is then entered by the other data entry worker into a second .csv file dedicated to coal fluxes (flux.csv).

Afterwards, the historical name of each city treated is researched on the web: the modern name is found when necessary, and the geographical coordinates of the city are entered in the cities.csv file in order for the algorithm to be able to place the city on the virtual map.

We made several simplification choices when working with the map. In general, mines located in the direct vicinity of cities have been counted as part of the city. The other mines that did not have a name on the map were named after the nearest modern settlement. Flows were considered constant between two agglomerations, although they are sometimes decreasing on the map (probably due to the consumption of coal by the locomotive and the delivery of small quantities of coal to some villages). By convention, the closest number to the starting city on the map has been retained. On the other hand, we decided not to simplify the origin of the coal (colour code) and also to record the information that defined each flow as naval or land-based. In general, flows of less than 20,000 tons per year have not been taken into account, firstly because their value is often not indicated on the map itself, because they are too small, and secondly because their value is negligible compared to other flows, the largest of which reach 14 million tons, or 7,000x more. Railway nodes have also been simplified in some cases.

Data visualization

Subsequently, we have coded an algorithm to treat that dataset and convert it into an interactive visualization of coal mining basins and consumption centres. The data of annual consumption and production are represented following a semantic system analogous to that of the historical map, with circle’s size representing consumption, production or transport hubs' importance. The net import-export for each city was also computed. The interactive maps were created using the folium Python library.

The algorithm also treats the dynamic part of that dataset, the transport fluxes, and convert it into a static, but also a dynamic visualization of coal transport flows and transport routes. In the static representation, the paths are represented with a different thickness depending on the intensity of the transit on the segment concerned. Thanks to this, we obtain a tree-like network whose branches extend to the borders of the Empire and become more refined as they move away from the production centres. In addition, land (rail) and sea networks can be represented separately. For these representations, the algorithm also superimposes several lines of different intensity and thickness to obtain a halo visual effect.

In order to obtain a dynamic visualization, the algorithm creates a random uniform distribution of the number of trains necessay to transport the amount of coal from one city to the next one over the whole year and for each transport line. This results in a fictitious train schedule that extends over a year and whose frequency and routes are realistic and plausible. This simulation is performed at a 5-minute frequency precision and produces a total of 334939 fictitious routes, spread over one whole year. The amount of coal which is transported in each train is estimated based on secondary literature research on American freight transport because it was unavailable on German freight transport: The American Railroad Journal of Aug 1, 1842 admires a train, carrying a 200 tons freight that was drawn from Albany to Boston[1]; while in 1903, an average American freight train can carry a load of about 391 tons[2]. An estimation was made, only as an approximation necessary to give us the correct order of magnitude: since the German rail network was considered well developed by the American standards in the second half of the 19th century[3], we took the arbitrary number of 330 tons by freight train as an average load.

The visual representation of this simulation is made in the form of a video. In order to create the video representing the coal flux, the html frames are first screenshotted automatically and transformed into png images. The latter are then assembled into a video in a second time, in order to obtain a dynamic visualization. The video shows the trains in moving dots of different colours, corresponding to the colour code adopted for the map to represent the different coal production locations. The ships that transport coal along the seaways are represented as small moving triangles.

The whole algorithm was coded using Python 3. The interactive maps are in html format and can therefore be easily integrated into any website. Finally, we have created a website allowing the user to interact with the virtually recreated map.

[1] The Brooklyn Historic Railway Association, Brooklyn, NY. <http://www.brooklynrail.ne/science_of_railway_locomotion.html>

[2] Tom Morrison, The American Steam Locomotive in the Twentieth Century, McFarland, 2018, p. 37.

[3] Toni Pierenkemper, Richard H. Tilly, The German Economy During the Nineteenth Century pp. 59-70.

Quality assessment

In order to operate a quality assessment of our work, we have decided to randomly test about 10% of our data. For that, we have exchanged our roles in the data entry process. One of us has arbitrarily selected a number of cities on the real historical map representing 10% of the total number of cities on our virtual map in order to see whether or not the city was present and correctly located on the virtual one as well as informed by correct quantity of production and consumption. In the same manner, one of us has arbitrarily selected a number of fluxes on the real historical map representing 10% of the total number of fluxes present on our virtual map to see whether or not the flux was present and correctly located, and whether its quantity and type were correct on the virtual one.

For each test, both the cities and the fluxes, we have categorized our results as either correct, imprecise or missing. By switching roles during that quality assessment process relative to our roles in the data entry process, we have tested the effect of the arbitrariness of selecting the visually most important cities and fluxes from the map in order to de-clutter it. One can notice from the charts above that our uncertainty level is high and can be considered a result of both the imprecision of manual data entry and the slightly random nature of an arbitrary process of selections based on the subjective criteria of visual prominence. Missing data can be caused by different selection choices.

Cities.jpg

Accuracy quantification of production/consumption data

Routes.jpg

Accuracy quantification of transport routes data

Besides, the errors reported are partly explained by the high complexity of the map and must be put into perspective with this high density of information. Moreover, some journeys between two cities may contain many different coal flows and it is sufficient for one of them to be imprecise for us to consider in our analysis that the data for this flux is imprecise, which is a rather strict criteria. In addition, the colors of the map are sometimes difficult to distinguish, perhaps because of the state of the colors on the original map or the brightness during scanning. This is particularly the case for coal flows represented in brown or beige, which are sometimes difficult to differentiate. Finally, it should be noted that the concentration of information sometimes impacts the readability of the map itself. Some figures may, depending on the interpretation, be attributed to different points on the route and this interpretative variable necessarily impacts the results of our verification, even if it does not inherently affect the quality of the data extracted. To summarize, these data must be clearly interpreted as coming from a partly subjective interpretation and simplification, and therefore subject to a certain uncertainty, but not inherently false.

As our work is based on two different data files, it is important to check that these two files are perfectly coordinated and that each information in the first file can be correctly linked to the entries in the second file and reciprocally. This verification is carried out directly within the program. Thus, we can confirm that 100% of the entries in the first file (.csv routes) find the information related to the arrival and departure cities of the second file (.csv cities). Since the map is interactive, the coordinates of the cities can be visually verified. We have carried out this verification and can also confirm that the contact details are accurate in 100% of cases.

bottom of page