Daniel Kaplan
Daniel Kaplan

Reputation: 67440

How do you convert a java String to a mailing address object?

As input I am getting an address as a String. It may say something like "123 Fake Street\nLos Angeles, CA 99988". How can I convert this into an object with fields like this:

Address1
Address2
City
State
Zip Code

Or something similar to this? If there is a java library that can do this, all the better.

Unfortunately, I don't have a choice about the String as input. It's part of a specification I'm trying to implement.

The input is not going to be very well structured so the code will need to be very fault tolerant. Also, the addresses could be from all over the world, but 99 out of 100 are probably in the US.

Upvotes: 4

Views: 10326

Answers (5)

Sachin Thapa
Sachin Thapa

Reputation: 3719

You can use JGeocoder

public static void main(String[] args) {
    Map<AddressComponent, String> parsedAddr  = AddressParser.parseAddress("Google Inc, 1600 Amphitheatre Parkway, Mountain View, CA 94043");
    System.out.println(parsedAddr);

    Map<AddressComponent, String> normalizedAddr  = AddressStandardizer.normalizeParsedAddress(parsedAddr); 
    System.out.println(normalizedAddr);
  }

Output will be:

{street=Amphitheatre, city=Mountain View, number=1600, zip=94043, state=CA, name=Google Inc, type=Parkway}
{street=AMPHITHEATRE, city=MOUNTAIN VIEW, number=1600, zip=94043, state=CA, name=GOOGLE INC, type=PKWY}

There is another library International Address Parser you can check its trial version. It supports country as well.

AddressParser addressParser = AddressParser.getInstance();
AddressStandardizer standardizer = AddressStandardizer.getInstance();//if enabled
AddressFormater formater = AddressFormater.getInstance();

String rawAddress = "101 Avenue des Champs-Elysées 75008 Paris";

//you can try to detect the country
CountryDetector detector = CountryDetector.getInstance();
String countryCode = detector.getCountryCode("7580 Commerce Center Dr ALABAMA");
System.out.println("detected country=" + countryCode);

Also, please check Implemented Countries in this library.

Cheers !!

Upvotes: 3

Matt
Matt

Reputation: 23759

I work at SmartyStreets where we develop address parsing and extraction algorithms.

It's hard.

If most of your addresses are in the US, you can use an address verification service to provide guaranteed accurate parse results (since the addresses are checked against a master list).

There are several providers out there, so take a look around and find one that suits you. Since you probably won't be able to install the database locally (not without a big fee, because address data is licensed by the USPS), look for one that offers a REST endpoint so you can just make an HTTP request. Since it sounds like you have a lot of addresses, make sure the API is high-performing and lets you do batch requests.

For example, with ours:

Input:

13001 Point Richmond Dr NW, Gig Harbor WA

Output:

Address verified

Or the more specific breakdown of components, if needed:

components

If the input is even messier, there are a few address extraction services available that can handle a little bit of noise within an address and parse addresses out of text and turn them into their components. (SmartyStreets offers this also, as a beta API. I believe some other NLP services do similar things too.)

Granted, this only works for US addresses. I'm not as expert on UK or Canadian addresses, but I believe they may be slightly simpler in general.

(Beyond a small handful of well-developed countries, international data is really hit-and-miss. Reliable data sets are hard to obtain or don't exist. But if you're on a really tight budget you could write your own parser for all the address formats.)

Upvotes: 2

Zero
Zero

Reputation: 1646

I assume the sequence of information is always the same, as in the user will never enter postal code before State. If I got your question correctly you need logic to process afdress that may be incomplete (like missing a portion). One way to do it is look for portions of string you know are correct. You can treat the known parts of Address as separators. You will need City and State names and address words (Such as "Street", "Avenue", "Road" etc) in an array.

  1. Perform Index of with cities,states and the address words (and store them).
  2. Substring and cut out the 1st line of address (from start to the index of address signifying word +it's length).
  3. Check index of city name (index found in step 1). If it's -1 skip this step. If it's 0 Take it out (0 also means address line 2 is not in string). If it's more than 0, Substring and cut out anything from start of string to index of city name as the 2nd line of address.
  4. Check the index of state name. Once again if -1 skip this step. If 0 substring and cut out as state name.
  5. Whatever remains is your postal code
  6. Check the strings you just extracted for left over separators (commas, dots, new lines etc) and extract them;

If the address is missing both state and city you would actually need an a list of zip codes too, so better ensure the user enters at least 1 of them.

It's not impossible to implement what you need, but you probably don't want to waste all that time doing it. It's easier to just ensure user enters everything correctly.

Upvotes: 0

Chris Stillwell
Chris Stillwell

Reputation: 10547

If you are sure on the format, you can use regular expressions to get the address out of the string. For the example you provided something like this:

String address = "123 Fake Street\\nLos Angeles, CA 99988";     
String[] parts = address.split("(.*)\\n(.*), ([A-Z]{2}) ([0-9]{5})");

Upvotes: 1

Stuk4
Stuk4

Reputation: 33

Maybe you can use Regular Expression

Upvotes: -4

Related Questions