Reputation: 29
I am currently trying to figure out the best way to take an address line and separate it out into three fields for a file, house number, street name, and apartment number. Thankfully, the city, state, and zip are already in columns so all I have to parse out is just the three things listed above, but even that is proving difficult. My initial hope was to do this in COBOL using SQL, but I dont think I am able to use the PATINDEX example someone else had listed on a separate question thread, I kept getting -440 SQL code. My second thought was to do this in Java using the strings as arrays and checking the arrays for numbers, then letters, then a compare for "Apt" or something to that effect. I have this so far to try to test out what I'm ultimately trying to do, but I am getting out of bounds exception for the array.
class AddressTest{
public static void main (String[] arguments){
String adr1 = "100 village rest court";
String adr2 = "1000 Arbor lane Apt. 21-D";
String[] HouseNbr = new String[9];
String[] Street = new String[20];
String[] Apt = new String[5];
for(int i = 0; i < adr1.length();i++){
String[] forloop = new String[] {adr1};
if (forloop[i].substring(0,1).matches("[0-9]")){
if(forloop[i+1].substring(0,1).matches("[0-9]")){
HouseNbr[i] = forloop[i];
}
else if(forloop[i+1].substring(0,1).matches(" ")){
}
else if(forloop[i].substring(0,1).matches(" ")){
}
else{
Street[i] = forloop[i];
}
}
}
for(int j = 0; j < HouseNbr.length; j++){
System.out.println(HouseNbr[j]);
}
for(int k = 0; k < Street.length; k++){
System.out.println(Street[k]);
}
}
}
Any other thoughts would be extremly helpful.
Upvotes: 1
Views: 8448
Reputation: 29
I am still working on it, but for any in the future who may need to do this:
import java.util.Arrays;
import java.util.StringTokenizer;
import org.apache.commons.lang3.*;
class AddressTest{
public static void main (String[] arguments){
String adr1 = "100 village rest court";
//String adr2 = "1000 Arbor lane Apt. 21-D";
String reader = new String();
String holder = new String();
StringTokenizer a1 = new StringTokenizer(adr1);
String[] HouseNbr = new String[9];
String[] StreetName = new String[20];
String[] Apartment = new String[5];
int counter = 0;
while(a1.hasMoreElements()){
reader = a1.nextElement().toString();
System.out.println("Reader: " + reader);
if(StringUtils.isNumeric(reader)){
String[] HNBR = reader.split("");
for(int i = 1; i <= reader.length();i++){
System.out.println("HNBR:_" + HNBR[i]);
HouseNbr[i-1] = HNBR[i];
}
}
else if(StringUtils.startsWith(reader, "Apt.")){
holder = a1.nextElement().toString();
String[] ANBR = holder.split("");
for(int j = holder.length(); j >= 0;j--){
Apartment[j] = ANBR[j];
}
}
else{
String STR[] = reader.split("");
for(int k = 1; k <= reader.length();k++){
if(counter == StreetName.length){
break;
}
else{
StreetName[counter] = STR[k];
if(counter < StreetName.length){
counter++;
}
}
}
if((counter < StreetName.length) && a1.hasMoreElements()){
StreetName[counter] = " ";
counter++;
}
}
}
System.out.println(Arrays.toString(HouseNbr) + " " + Arrays.toString(StreetName)
+ " " + Arrays.toString(Apartment));
}
}
Upvotes: 1
Reputation: 4263
If you leverage the freely available U.S. Postal Service zip code finder (https://tools.usps.com/go/ZipLookupAction!input.action), you can get back an address in standardized format. The valid options on that format are documented by the USPS and will make it easier to write a very complicated regex, or a number of simple regexes, to read the standard form.
Upvotes: 1
Reputation: 1955
I would consider removing the unnecessary arrays and use a StringTokenizer...
public static void main(String[] args) {
String number;
String address;
String aptNumber;
String str = "This is String , split by StringTokenizer";
StringTokenizer st = new StringTokenizer(str);
System.out.println("---- Split by space ------");
while (st.hasMoreElements()) {
String s = System.out.println(st.nextElement());
if (StringUtils.isNumeric(s) {
number = s;
continue;
}
if(s.indexOf("Apt")) {
aptNumber = s.substring(s.indexOf("Apt"),s.length-1);
continue;
}
}
System.out.println("---- Split by comma ',' ------");
StringTokenizer st2 = new StringTokenizer(str, ",");
while (st2.hasMoreElements()) {
System.out.println(st2.nextElement());
}
}
Upvotes: 1