Reputation: 121
I am trying to retrieve phone numbers from the web with Kotlin and JSoup. But i am having some trouble getting the RegEx right. My most effective attempt so far has been:
val pattern = Pattern.compile("\\+[0-9.()-]{7,15}")
val numbers = doc.getElementsMatchingOwnText(pattern)
.flatMap {
pattern.toRegex()
.find(it.toString())
?.groups
?.map {
it!!.value
}!!.asIterable()
}
This is able to get capture numbers that match the +1-###-###-#### format but fails to capture:
+1 (###) ###-####
+1 (###)###-####
(###)###-####
and other north american phone number formats. I have also tried this pattern:
((\(\d{3}\) ?)|(\d{3}-))?\d{3}-\d{4}
and several others from the regex library i found online but they are not working. The site says that it uses the Javascript engine. Possibly this is why they are not working?
I would appreciate any help finding a pattern to capture as many north american phone number formats as possible or finding resources to help me better learn to write my own. Thanks for any help.
Upvotes: 3
Views: 5138
Reputation: 1
I'm just getting started but when you mentioned the length you didn't include the space characters. Also you need to use \ to escape characters such as + ( ) and - which have special meaning in RegEx.
Adriano was very specific with his matching. Depending on how important the exactness is you could also try a simpler version that I came up with quickly that is similar to yours but includes the cases I mentioned above.
[\\+0-9\\(\\)\\- ]{7,19}
Hopefully, the above came out right. Again Adriano said you need to make sure you have the right escape characters. Usually either a single \
or a double \\
Upvotes: 0
Reputation: 1808
Whenever you are using a regex online, you must verify how to escape the characters on the language that you are using.
Most of the online regexes does not provide the export to Java / Kotlin, therefore it won't work as is. In Kotlin / Java, you will need to use double backslash (\\
) to properly escape a character.
With that explained, this is a working regex which will match all the strings you gave:
(\\+\\d( )?)?([-\\( ]\\d{3}[-\\) ])( )?\\d{3}-\\d{4}
Test code:
fun main(args: Array<String>) {
var regstr = "(\\+\\d( )?)?([-\\( ]\\d{3}[-\\) ])( )?\\d{3}-\\d{4}"
var teststr1 = "+1-555-555-5555" // +1-###-###-####
var teststr2 = "+1 (555) 555-5555" // +1 (###) ###-####
var teststr3 = "+1 (555)555-5555" // +1 (###)###-####
var teststr4 = "(555)555-5555" // (###)###-####
var teststr5 = "(55)5555-555" // Not valid format
println("matched: " + regstr.toRegex().find(teststr1)?.value)
println("matched: " + regstr.toRegex().find(teststr2)?.value)
println("matched: " + regstr.toRegex().find(teststr3)?.value)
println("matched: " + regstr.toRegex().find(teststr4)?.value)
println("matched: " + regstr.toRegex().find(teststr5)?.value)
}
Side Note: Mostly likely, there is a better regex - just made this regex to match all the string you provided.
Upvotes: 1