Reputation: 1272
I have to parse Indian address just like google do, I need some examples of how to parse an address. Are there any examples of address parsing. Are there any free dictionaries available of Indian city, locality, states, pincodes etc
for example
5/802,vedvihar society,near chandni chowk, pune,411038
will parse to
building/street=5
house no=802
locality/society=vedvihar
landmark=chandni chowk
city=pune
pin=411038
Upvotes: 1
Views: 4420
Reputation: 368
If you get rid of the html tags, there is powerful open-source library libpostal that fits for this use case very nicely. There are bindings to different programming languages. Libpostal is a C library for parsing/normalizing street addresses around the world using statistical NLP and open data. The goal of this project is to understand location-based strings in every language, everywhere.
For Java, there is jpostal
I have created a simple Docker image with Python binding pypostal you can spin off and try it very easily pypostal-docker
Upvotes: 1
Reputation: 369
I don't know the context of your question so maybe this is completly off topic but here it was I did a few month ago : I work around the very complex natural language processing part by using google geocoding API.
The API let you send full text address and get back a well formatted XML from wihch you can easily extract the street, city or whatever information you need.
Maybe this is not the solution you you are looking for, but if you can use the Maps API you will save a lot of time and efforts : http://code.google.com/apis/maps/documentation/geocoding/
Upvotes: 0
Reputation: 346377
Are there any free dictionaries available of Indian city, locality, states, pincodes etc
geonames.org has a downloadable database of towns, including postal codes and administrative divisions.
Upvotes: 1
Reputation: 8598
So here are a few links which may help in parsing the postal addresses:
Parse usable Street Address, City, State, Zip from a string and Java postal address parser
Upvotes: 3
Reputation: 23550
You can use StringTokenizer ( http://docs.oracle.com/javase/6/docs/api/java/util/StringTokenizer.html ) for which you can find a tutorial here: http://www.devdaily.com/blog/post/java/java-faq-stringtokenizer-example .
In the example the string is split on space-boundaries, in your case you would want to replace the " "
by ","
in the line: StringTokenizer st = new StringTokenizer(tags," ");
.
Make sure to aString.trim()
your sub-strings.
Please tell if you need additional info
Upvotes: 1