Wikidata is the structured-data sister project of Wikipedia, and it carries a remarkable amount of port and harbour data — including UN/LOCODEs, coordinates, country, port type, and links to the Wikipedia article in dozens of languages. PortWatch uses Wikidata as its primary port reference dataset. This guide walks through the SPARQL queries we use and the issues we hit.
Basic query: every seaport with coordinates and country
The Wikidata class for seaport is Q44782. A simple query that returns every seaport with at least a coordinate location and a country property looks like this: SELECT ?port ?portLabel ?country ?countryLabel ?coord WHERE { ?port wdt:P31/wdt:P279* wd:Q44782; wdt:P17 ?country; wdt:P625 ?coord. SERVICE wikibase:label { bd:serviceParam wikibase:language “en”. } } LIMIT 5000. The P31/P279* path catches not only direct seaport instances but also subclasses such as container port and oil terminal.
Adding UN/LOCODE
UN/LOCODE on Wikidata is property P1937. To return only ports that have a UN/LOCODE assignment add ?port wdt:P1937 ?locode. to the WHERE clause. In practice not every seaport entity in Wikidata has a P1937 value; smaller and inland ports are particularly sparse. PortWatch ingests both the with-LOCODE and without-LOCODE results and joins to the official UN/LOCODE registry as a secondary pass.
Pitfalls
The biggest pitfall is duplicate entities. Many ports have two or three Wikidata items — one for the harbour basin, one for the port authority as an organisation, and one for the surrounding city. The basin and the city often share a coordinate; the port authority sometimes does not. Filter on instance-of-seaport and prefer the entity whose coordinates fall inside the port basin polygon if you can compute it.
The second pitfall is rate limiting. The Wikidata public endpoint enforces query timeouts (about a minute) and rate limits per IP. Long-running queries should be paginated with LIMIT and OFFSET, or split by country code, or run against a self-hosted Wikidata Query Service if scale matters.
Why we use Wikidata
Wikidata content is published under CC0, which permits unrestricted re-use including in commercial directories. The data quality is uneven — well-loved ports are richly described, obscure ones are not — but the licensing makes it the obvious starting point for any open maritime directory.