“Keepin' it fresh (and good)!” - Continuous Ingestion of OSM Data at Facebook
Sunday, 11:00 AM – 20 min
Building forward from our work on Mobius Logical Changesets (presented last year at SotM US 2018), we have created an automated ingestion and integrity framework for OSM data that allows us to selectively update parts of the map instead of doing a full snapshot change all at once.
Decomposing a large set of changes in this way gives us the flexibility to rapidly ingest our own additions to the map, focus on geographical areas of importance to downstream products, and allows us to quickly apply hotfixes whenever egregious problems do arise.
With millions of tiny changes happening every week, we have created a system that is built on per-feature approval and preprocessing, that allows us to ingest changes at scale, while creating rules to automatically process logical changesets and enforce integrity constraints (e.g. anti-vandalism, anti-profanity etc.).
Due to the contextual nature of some of the changes in OpenStreetMap, the system combines Human Approval, necessary for highly visible features such as names of large administrative areas, with Automated AI/ML-based approval: for example, using computer vision techniques to reconcile newly created features against satellite imagery ground truth, or applying NLP techniques to determine whether other user-visible string changes are sensible and valid. These components are combined to create a continuous ingest-validate-deploy cycle for OSM map data.