How to benefit from the open building datasets of Microsoft, Google and OSM

OpenStreetMap with crowdsourcing, Microsoft Bing and Google with advanced image processing, AI and machine learning methods provide enormous opportunities and building data on a global scale.

Microsoft and Google have been investing in mapping for a long time. Both tech giants have released their building points that can be used for population estimation, urban planning, risk analysis, environmental studies, and humanitarian purposes.

When we reviewed the open building datasets from Microsoft, Google and OpenStreetMap, we found that each source has different advantages and considerations. The first point to consider is the accuracy and integrity of the data.

Especially in developing regions; modern, traditional and shanty houses are mixed. The developing housing areas are changing faster than established regions.

We would like to share some of our learnings related to the open building point datasets, focusing on Phnom Penh, Cambodia. Phnom Penh has an average annual population growth rate of 4.9%, based on the latest Cambodia 2019 census. The city currently has nearly 2.5 million population.

As it can be seen from the below satellite image, Microsoft Bing provides regular polygon data (green polygons) in the developed areas near the Mekong River (3rd zone on the map). When you focus on away from the river and towards the centre, the polygons become quite sparse 1st zone is estimated to consist of traditional and shanty buildings which is difficult to define the building shapes. But, as it can be seen in the 2nd zone, although there are buildings in regular shapes, we can still see gaps in the Microsoft Bing building data.

Microsoft Bing Building Data – Centre of Phnom Penh

If we look into details of the Microsoft Bing building polygons, we can see some deformities. It seems mixture of high-rise and low-rise buildings and irregular building shapes worsen the polygon accuracy as shown below.

Google Building Data – Centre of Phnom Penh

The Google building data shows no gaps within the city centre of Phnom Penh. Google's image processing algorithms work in both mixed and regular areas. However, if you zoom in, you can see that less data than Microsoft Bing provides.

As we see in the pictures below, Google provides more number of building data in underdeveloped mixed areas compared to Microsoft. When we look at the layout of the polygons, we do not encounter deformities in polygons as in Microsoft data. The drawn building polygons are smoother and more accurate.

However, the number of Google building data is limited compared to Microsoft Bing. Particularly when we examine developed areas, Microsoft Bing provides very comprehensive building dataset. As we see in the pictures below, there are many deficiencies in the building polygons near the Mekong River, where luxury residences are located.

Google Building Data

Google Building Data

Microsoft Bing Building Data

Microsoft Bing Building Data

Unlike the Google and Microsoft Bing building data which created by the advanced image processing algorithms, deep learning and AI, OpenStreetMap is crowdsourcing data. It is one of the most important crowdsourcing data for GIS datasets. It provides high polygon accuracy and some additional information such as the name of the building or type. Although OpenStreetMap provides reliable and accurate data for some areas, its coverage is limited for the developing cities like Phnom Penh.

OpenStreetMap Building Data – Centre of Phnom Penh

It is very rare to find building data for traditional areas in OpenStreetMap. It generally shows the high-rise landmark buildings or recently developed popular areas.

Using these three important data sources help to create a good building database. Although it is important to review all these sources, we need to search for other sources to enrich the data and identify its completeness. Some of the other useful data sources are property developers and project web sites, property ad listings and the recent satellite images.

CII data team not only identify each housing projects, condos, high-rise buildings, and new housing developments by working on all data sources but also, enrich this building database by combining other attributes as project names, total units, completion year, price levels etc. If you would like to access enriched building datasets, please feel free to contact with us.

Building Dataset in CII solutions – Phnom Penh