When my local hackerspace had their first HackJAM dedicated to Open Data apps, a couple of the coding groups involved found that they had a problem with the data. The files that they extracted from the City of WIndsor’s Open Data catalogue seems to come out all weird in their maps. Turns out that they didn’t realize that the City made use of ESRI software that made use of a map projection that was different for the map projection that is generally used by web maps.
Now, when I told this story to a friend also interested in web maps, we was kinda shocked. He thought that a long/lat coordinate is a long/lat coordinate. How could a map get that wrong? And the answer is: when the map and the elements of the map are not using the same projections.
And the difference between using a projection that geared towards the whole of the US and one that is geared towards a small part can be as much as half a mile, at least according to this post dedicated to Choosing the Right Map Projection.
But before we get started, let’s unpack ESRI shapefiles first. From Finding and Making Sense of Geospatial Information on the Internet:
A “shapefile” actually consists of a minimum of at least 3 files. All the files will have the same name except for the postfix. At a minimum you need the *.shp (the geometries), *.dbf (the tabular data), and the *.shx (links the two files together) for a shapefile to be whole. There can by many other extensions included, with the most important being a *.prj which contains the projection information…
So we’re going to open up some shapefiles from the Open Data Catalogue of the City of Windsor to see what projection they are in. But first, we’re going to learn what the Google Maps and most of the other web mapping systems use as projection:
The European Petroleum Standards Group (EPSG) has come up with numbers for almost any projection you will deal with in normal everyday mapping. Given an EPSG number you know precisely which coordinate system you are dealing with and which one you want to go to. Fortunately, some great members of the geospatial community created a site SpatialReference.org that lists almost all the EPSG numbers along with other ways of representing those same coordinate systems (such as in well know text). Bookmark this site if you plan to work with geospatial data a lot.
If you write an application that gets GPS coordinates from your phone, browser that supports geolocation, or the defaults from a handheld GPS unit you will be getting your data as Lat, Long, and WGS-84 datum. The EPSG for the 2D version of this is EPSG:4326 and 3D is EPSG:4979. The Google and Bing Maps projection is EPSG:3857 and actually uses a spherical datum rather than an ellipsoid.
Incidentally, EPSG:3857 also goes by WGS84 Web Mercator (Auxiliary Sphere) or EPSG:900913 (900913 is leet for google)
Does it make a difference? Well, thanks to the considerations of the City of Windsor, we can see the difference! You see, their open data catalogue gives two sets of Shape files for the Ward Boundaries. There is http://www.citywindsor.ca/opendata/Lists/OpenData/Attachments/9/Municipal_Ward_Boundaries_2010_LL84.zip which is in WGS84. Once you figure out how to open the project in a GIS like QGIS, it looks likes this:
But if you open the other Ward Boundary zip project, http://www.citywindsor.ca/opendata/Lists/OpenData/Attachments/9/2010_Municipal_Ward_Boundaries.zip that project and projection looks like this:
What’s the projection that the second project is using? If we look at the file in QGIS under Layer > Properties > General, we can learn that they are using…
And what does *that* mean?
Let’s go back to the Mapping Small Areas of the Choosing the Right Map Projection post…
For a regional map—a few counties, or even many smaller states—a UTM (Universal Transverse Mercator, not the same as a Mercator, confusingly) projection might be a good choice. One of the biggest advantages of a UTM is that measuring distances between two points is a snap. Measuring distances between points in more familiar latitude and longitude degrees requires some pretty complex math, though modern software tools often have distance calculations built in. But in UTM, there are no degrees—the map units are measured in meters. That makes for high accuracy, easy math and easy conversions.UTM zones
The trade-off is that this trick only works over relatively small areas. UTM keeps distortion down by dividing the earth into 60 zones, each of which is about 300-475 miles wide east-to-west, depending on what latitude you’re at. Inside that zone, and usually into the next zone east or west, measurements are quite accurate. But that accuracy fades the farther away from the origin you get. That means you need to know which zone your map area is in, and it makes UTM a poor choice for national or world maps.
Projections make a difference. Also, the tools that one uses as well.
The participants guide for the Open Data HackJam recommended to use the pyshp package. But pyshp doesn’t deal with projections at all! That’s when you need to use PROJ – that is, if you want to use Python for your geospatial work.