DAN212
DAN212

Reputation: 51

Java - Regex for validating US postcodes from a text file

I'm having trouble coming up with a regex capable of validating a US postcode (10,000 actually) that can read my entries in there current form. My program is using a validator from a UK postcode validator that I created. I am very stuck on this and am having trouble on figuring out how too proceed.

package postcodesort;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
*
* 
*/
public class ZipCodeValidator {
private static String regex = "^[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][ABD-HJLNP-UW-Z]{2}$";
private static Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);

public boolean isValid(String zipCode) {
    Matcher matcher = pattern.matcher(zipCode);
    return matcher.matches();
}
}

Below is a small example of the data in my text file.

"01","35005","AL","ADAMSVILLE",86.959727,33.588437,10616,0.002627

"05","72001","AR","ADONA",92.903325,35.046956,494,0.00021

"06","90804","CA","SIGNAL HILL",118.155187,33.782993,36092,0.001213

So I want it to read the first three sets of data. So "01","35006","AL" will be read and validated whilst the rest is ignored. So as long as it has two numbers, 5 numbers and two letters then it would be a validate postcode. I don't know how to make this happen.

Any and all help is appreciated!

Upvotes: 1

Views: 1811

Answers (3)

Ro Yo Mi
Ro Yo Mi

Reputation: 15000

Description

^"([0-9]{2})","([0-9]{5})","([a-z]{2})"

Regular expression visualization

This regular expression will do the following:

  • Reads the first three sets of data.
  • Validates the first value has two numbers
  • Validates the second value has 5 numbers
  • Validates the third value has two letters
  • captures the individual values

Example

Live Demo

https://regex101.com/r/bO7qK7/1

Sample text

"01","35005","AL","ADAMSVILLE",86.959727,33.588437,10616,0.002627
"05","72001","AR","ADONA",92.903325,35.046956,494,0.00021
"06","90804","CA","SIGNAL HILL",118.155187,33.782993,36092,0.001213

Sample Matches

  • Capture group 0 gets the entire string for the first three values
  • Capture group 1 gets the value inside quotes for the first value
  • Capture group 2 gets the value inside quotes for the first value
  • Capture group 3 gets the value inside quotes for the first value
[0][0] = "01","35005","AL"
[0][1] = 01
[0][2] = 35005
[0][3] = AL

[1][0] = "05","72001","AR"
[1][1] = 05
[1][2] = 72001
[1][3] = AR

[2][0] = "06","90804","CA"
[2][1] = 06
[2][2] = 90804
[2][3] = CA

Explanation

NODE                     EXPLANATION
----------------------------------------------------------------------
  ^                        the beginning of a "line"
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [0-9]{2}                 any character of: '0' to '9' (2 times)
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  ","                      '","'
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    [0-9]{5}                 any character of: '0' to '9' (5 times)
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  ","                      '","'
----------------------------------------------------------------------
  (                        group and capture to \3:
----------------------------------------------------------------------
    [a-z]{2}                 any character of: 'a' to 'z' (2 times)
----------------------------------------------------------------------
  )                        end of \3
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------

Upvotes: 1

Ro Yo Mi
Ro Yo Mi

Reputation: 15000

Description

^(?:"[^"]*",){1}"([^"]*)"

Regular expression visualization

This regular expression will do the following:

  • return the postal code which appears to be in the second quoted position
  • the (?:"[^"]*",){1} is arguably overkill for this particular problem, but the construct allows you specify exactly how many quote-comma delimited values to skip before returning the actual value. In this case we're skipping 1 field
  • Return the value not including the quotes

Example

Live Demo

https://regex101.com/r/cJ6iE9/1

Sample text

"01","35005","AL","ADAMSVILLE",86.959727,33.588437,10616,0.002627
"05","72001","AR","ADONA",92.903325,35.046956,494,0.00021
"06","90804","CA","SIGNAL HILL",118.155187,33.782993,36092,0.001213

Sample Matches

  • Capture group 0 gets the entire string up to the field in question
  • Capture group 1 gets the zip code
[0][0] = "01","35005"
[0][1] = 35005

[1][0] = "05","72001"
[1][1] = 72001

[2][0] = "06","90804"
[2][1] = 90804

Explanation

NODE                     EXPLANATION
----------------------------------------------------------------------
  ^                        the beginning of a "line"
----------------------------------------------------------------------
  (?:                      group, but do not capture (1 times):
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    [^"]*                    any character except: '"' (0 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
    ",                       '",'
----------------------------------------------------------------------
  ){1}                     end of grouping
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [^"]*                    any character except: '"' (0 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------

Upvotes: 0

Chetan Jadhav CD
Chetan Jadhav CD

Reputation: 1146

If you are passing only the 5 digit zipcode, you can validate it by modifying your regex to the following:

^[0-9]{5}$

If there are cases in which the users use ZIP+4 format, eg: 12345-3333, then use the following regex to validate this:

^[0-9]{5}(?:-[0-9]{4})?$

However, if you want to validate the name of the city against the zipcode provided, you might want to have a look at the discussion here

Upvotes: 0

Related Questions