Abhij
Abhij

Reputation: 1272

Address parsing

I have to parse Indian address just like google do, I need some examples of how to parse an address. Are there any examples of address parsing. Are there any free dictionaries available of Indian city, locality, states, pincodes etc

for example

5/802,vedvihar society,near chandni chowk, pune,411038

will parse to

building/street=5
house no=802
locality/society=vedvihar
landmark=chandni chowk
city=pune
pin=411038

Upvotes: 1

Views: 4420

Answers (5)

WojtylaCz
WojtylaCz

Reputation: 368

If you get rid of the html tags, there is powerful open-source library libpostal that fits for this use case very nicely. There are bindings to different programming languages. Libpostal is a C library for parsing/normalizing street addresses around the world using statistical NLP and open data. The goal of this project is to understand location-based strings in every language, everywhere.

For Java, there is jpostal

I have created a simple Docker image with Python binding pypostal you can spin off and try it very easily pypostal-docker

Upvotes: 1

YCI
YCI

Reputation: 369

I don't know the context of your question so maybe this is completly off topic but here it was I did a few month ago : I work around the very complex natural language processing part by using google geocoding API.

The API let you send full text address and get back a well formatted XML from wihch you can easily extract the street, city or whatever information you need.

Maybe this is not the solution you you are looking for, but if you can use the Maps API you will save a lot of time and efforts : http://code.google.com/apis/maps/documentation/geocoding/

Upvotes: 0

Michael Borgwardt
Michael Borgwardt

Reputation: 346377

Are there any free dictionaries available of Indian city, locality, states, pincodes etc

geonames.org has a downloadable database of towns, including postal codes and administrative divisions.

Upvotes: 1

Kuldeep Jain
Kuldeep Jain

Reputation: 8598

So here are a few links which may help in parsing the postal addresses:

Parse usable Street Address, City, State, Zip from a string and Java postal address parser

Upvotes: 3

Bernd Elkemann
Bernd Elkemann

Reputation: 23550

You can use StringTokenizer ( http://docs.oracle.com/javase/6/docs/api/java/util/StringTokenizer.html ) for which you can find a tutorial here: http://www.devdaily.com/blog/post/java/java-faq-stringtokenizer-example .

In the example the string is split on space-boundaries, in your case you would want to replace the " " by "," in the line: StringTokenizer st = new StringTokenizer(tags," ");.

Make sure to aString.trim() your sub-strings.

Please tell if you need additional info

Upvotes: 1

Related Questions