Live tracking · 320 vessels · 277 ports Data refreshed 60d ago

Open data · 8 min read

Querying Wikidata for ports — a SPARQL cookbook

Worked examples of SPARQL queries against the Wikidata endpoint to retrieve port data, with notes on common pitfalls.

Wikidata is the structured-data sister project of Wikipedia, and it carries a remarkable amount of port and harbour data — including UN/LOCODEs, coordinates, country, port type, and links to the Wikipedia article in dozens of languages. PortWatch uses Wikidata as its primary port reference dataset. This guide walks through the SPARQL queries we use and the issues we hit.

Basic query: every seaport with coordinates and country

The Wikidata class for seaport is Q44782. A simple query that returns every seaport with at least a coordinate location and a country property looks like this: SELECT ?port ?portLabel ?country ?countryLabel ?coord WHERE { ?port wdt:P31/wdt:P279* wd:Q44782; wdt:P17 ?country; wdt:P625 ?coord. SERVICE wikibase:label { bd:serviceParam wikibase:language “en”. } } LIMIT 5000. The P31/P279* path catches not only direct seaport instances but also subclasses such as container port and oil terminal.

Adding UN/LOCODE

UN/LOCODE on Wikidata is property P1937. To return only ports that have a UN/LOCODE assignment add ?port wdt:P1937 ?locode. to the WHERE clause. In practice not every seaport entity in Wikidata has a P1937 value; smaller and inland ports are particularly sparse. PortWatch ingests both the with-LOCODE and without-LOCODE results and joins to the official UN/LOCODE registry as a secondary pass.

Pitfalls

The biggest pitfall is duplicate entities. Many ports have two or three Wikidata items — one for the harbour basin, one for the port authority as an organisation, and one for the surrounding city. The basin and the city often share a coordinate; the port authority sometimes does not. Filter on instance-of-seaport and prefer the entity whose coordinates fall inside the port basin polygon if you can compute it.

The second pitfall is rate limiting. The Wikidata public endpoint enforces query timeouts (about a minute) and rate limits per IP. Long-running queries should be paginated with LIMIT and OFFSET, or split by country code, or run against a self-hosted Wikidata Query Service if scale matters.

Why we use Wikidata

Wikidata content is published under CC0, which permits unrestricted re-use including in commercial directories. The data quality is uneven — well-loved ports are richly described, obscure ones are not — but the licensing makes it the obvious starting point for any open maritime directory.