All posts

Geocoding Vietnamese addresses: edge cases

Diacritics, alley numbering, ward renames, and the other Vietnamese-address quirks that break naive geocoders — and how GoGoDuk handles them.

If you've shipped an app in Vietnam, you've watched a geocoder shrug at an address you can read with no trouble. Vietnamese addresses are full of edge cases that don't show up in a US dataset — and most off-the-shelf geocoders weren't trained on them.

Here are the cases that bite, and how we handle each in /v1/suggest and /v1/place/resolve.

1. Diacritics, half-diacritics, and ASCII fallbacks

A user might type "Hà Nội", "Ha Noi", "Hanoi", or "ha noi" depending on keyboard and habit. A naive geocoder treats these as three different strings.

We normalize on the index side: every Vietnamese name is stored with its full-diacritic form plus an ASCII-folded variant. Query-time, we fold the input the same way and match against both columns. Costs nothing at query time; pays off every request.

2. Alley numbering (Ngõ / Hẻm)

Hanoi addresses love nested alleys: "Số 5, ngõ 12, ngách 3, phố Hào Nam" means house 5, in alley 3, off alley 12, on Hào Nam street. A flat tokenizer splits this wrong.

Our parser recognizes the ngõ / ngách / hẻm / kiệt tokens as alley-depth markers and preserves their order. We index the alleys separately from the parent street, so suggestions surface intermediate results ("Ngõ 12 Hào Nam") before falling all the way through to a specific door number.

3. Ward renames and merges

Vietnam's administrative divisions reorganize regularly. Wards get renamed, merged, or split — old addresses still circulate for years.

We track both the current ward name and the historical aliases in our admin-boundaries dataset (all 11,000+ wards). A search for an old ward name still resolves; reverse-geocoding always returns the current official name.

4. Numbered streets that aren't streets

"Đường số 7" ("Street number 7") shows up dozens of times across HCMC, each in a different ward. A bare-name match returns the wrong one.

The fix isn't fancy: we always anchor numbered-street matches against ward or district context if it's present in the query, and we surface the ward in the suggestion result so the user can disambiguate.

5. Building names vs street addresses

"Vincom Center Bà Triệu" is a building, not an address. A user might type either form. We resolve building names to their canonical address and return both — the result includes name (building) and address (street) so the client picks what to display.

Try it

curl -H "X-API-Key: gdk_live_..." \
  "https://api.gogoduk.com/v1/suggest?input=ngo+12+hao+nam"

The free tier handles all of the above. If you hit a case we don't — weird romanization, a regional address pattern we missed — email [email protected] with a sample query and we'll look at it.

Want to use GoGoDuk?

Free forever — 100 requests/day per account, no credit card. Higher limits on request.

Sign up →