mox_mox
mox_mox

Reputation: 121

Writing a more versatile RegEx for phone number capture

I am trying to retrieve phone numbers from the web with Kotlin and JSoup. But i am having some trouble getting the RegEx right. My most effective attempt so far has been:

val pattern = Pattern.compile("\\+[0-9.()-]{7,15}")

        val numbers = doc.getElementsMatchingOwnText(pattern)
                        .flatMap {
                            pattern.toRegex()
                                    .find(it.toString())
                                    ?.groups
                                    ?.map {
                                it!!.value
                            }!!.asIterable()
                        }

This is able to get capture numbers that match the +1-###-###-#### format but fails to capture:

+1 (###) ###-#### 
+1 (###)###-####
(###)###-####

and other north american phone number formats. I have also tried this pattern:

((\(\d{3}\) ?)|(\d{3}-))?\d{3}-\d{4}

and several others from the regex library i found online but they are not working. The site says that it uses the Javascript engine. Possibly this is why they are not working?

I would appreciate any help finding a pattern to capture as many north american phone number formats as possible or finding resources to help me better learn to write my own. Thanks for any help.

Upvotes: 3

Views: 5138

Answers (2)

eNgE
eNgE

Reputation: 1

I'm just getting started but when you mentioned the length you didn't include the space characters. Also you need to use \ to escape characters such as + ( ) and - which have special meaning in RegEx.

Adriano was very specific with his matching. Depending on how important the exactness is you could also try a simpler version that I came up with quickly that is similar to yours but includes the cases I mentioned above.

[\\+0-9\\(\\)\\- ]{7,19}

Hopefully, the above came out right. Again Adriano said you need to make sure you have the right escape characters. Usually either a single \ or a double \\

Upvotes: 0

Adriano Martins
Adriano Martins

Reputation: 1808

Whenever you are using a regex online, you must verify how to escape the characters on the language that you are using.

Most of the online regexes does not provide the export to Java / Kotlin, therefore it won't work as is. In Kotlin / Java, you will need to use double backslash (\\) to properly escape a character.

With that explained, this is a working regex which will match all the strings you gave:

(\\+\\d( )?)?([-\\( ]\\d{3}[-\\) ])( )?\\d{3}-\\d{4}

Test code:

fun main(args: Array<String>) {
    var regstr = "(\\+\\d( )?)?([-\\( ]\\d{3}[-\\) ])( )?\\d{3}-\\d{4}"
    var teststr1 = "+1-555-555-5555"   // +1-###-###-####
    var teststr2 = "+1 (555) 555-5555" // +1 (###) ###-####
    var teststr3 = "+1 (555)555-5555"  // +1 (###)###-####
    var teststr4 = "(555)555-5555"     // (###)###-####
    var teststr5 = "(55)5555-555"      // Not valid format
    println("matched: " + regstr.toRegex().find(teststr1)?.value)
    println("matched: " + regstr.toRegex().find(teststr2)?.value)
    println("matched: " + regstr.toRegex().find(teststr3)?.value)
    println("matched: " + regstr.toRegex().find(teststr4)?.value)
    println("matched: " + regstr.toRegex().find(teststr5)?.value)
}

Side Note: Mostly likely, there is a better regex - just made this regex to match all the string you provided.

Upvotes: 1

Related Questions