James Phillips
James Phillips

Reputation: 33

Regex to parse phone numbers in text document with java

I'm trying to use regex to find phone numbers in the form (xxx) xxx-xxxx that are all inside a text document with messy html.

The text file has lines like:

  <div style="font-weight:bold;">
  <div>
   <strong>Main Phone:
   <span style="font-weight:normal;">(713) 555-9539&nbsp;&nbsp;&nbsp;&nbsp;
   <strong>Main Fax:
   <span style="font-weight:normal;">(713) 555-9541&nbsp;&nbsp;&nbsp;&nbsp;
   <strong>Toll Free:
   <span style="font-weight:normal;">(888) 555-9539

and my code contains:

Pattern p = Pattern.compile("\\(\\d{3}\\)\\s\\d{3}-\\d{4}");
Matcher m = p.matcher(line); //from buffered reader, reading 1 line at a time

if (m.matches()) {
     stringArray.add(line);
}

The problem is when I put even simple things into the pattern to compile, it still returns nothing. And if it doesn't even recognize something like \d, how am I going to get a telephone number? For example:

Pattern p = Pattern.compile("\\d+"); //Returns nothing
Pattern p = Pattern.compile("\\d");  //Returns nothing
Pattern p = Pattern.compile("\\s+"); //Returns lines
Pattern p = Pattern.compile("\\D");  //Returns lines

This is really confusing to me, and any help would be appreciated.

Upvotes: 3

Views: 2177

Answers (2)

Khozzy
Khozzy

Reputation: 1103

Or instead of regexp you can use Google library - libphonenumber, just as follows

    Set<String> phones = new HashSet<>();
    PhoneNumberUtil util = PhoneNumberUtil.getInstance();

    Iterator<PhoneNumberMatch> iterator = util.findNumbers(source, null).iterator();

    while (iterator.hasNext()) {
        phones.add(iterator.next().rawString());
    }

Upvotes: 2

Ravi K Thapliyal
Ravi K Thapliyal

Reputation: 51721

Use Matcher#find() instead of matches() which would try to match the complete line as a phone number. find() would search and return true for sub-string matches as well.

Matcher m = p.matcher(line);

Also, the line above suggests you're creating the same Pattern and Matcher again in your loop. That's not efficient. Move the Pattern outside your loop and reset and reuse the same Matcher over different lines.

Pattern p = Pattern.compile("\\(\\d{3}\\)\\s\\d{3}-\\d{4}");

Matcher m = null;
String line = reader.readLine();
if (line != null && (m = p.matcher(line)).find()) {
    stringArray.add(line);
}

while ((line = reader.readLine()) != null) {
  m.reset(line);
  if (m.find()) {
    stringArray.add(line);
  }
}

Upvotes: 3

Related Questions