DanielAttard
DanielAttard

Reputation: 3615

Regex to select first occurrence of phone number from string

I am trying to create a regex that will grab the first occurrence of a phone number from this string:

<font color="#848484">Transferor(s)</font>
<br />Harzuz Holdings Ltd<br />Ontario Potato Inc&nbsp; &nbsp; &nbsp;905-791-7735<br />
<em>Clark Packaging Products Inc</em>
<p>
</p>Pres: Jay Burstein<br />8 Tracey Blvd, Unit 2<br />Brampton, Ontario<br />L6T 5R9<p>
</p>
<font color="#848484">Transferee(s)</font>
<br />2470347 Ontario Inc&nbsp; &nbsp; &nbsp;416-223-4403<br />

The phone numbers are always consistently formatted like this: 999-999-9999. The problem I am having is that my regex is grabbing both phone numbers from my string when I only want to grab the first one. This is what I have tried so far:

(\d\d\d-\d\d\d-\d\d\d\d) ? returns multiple phone numbers

(\d\d\d-\d\d\d-\d\d\d\d) {1} also returns multiple phone numbers

What regex can I use to select the first phone number? And what regex can I use to select the second phone number?

I am using uBot which is a type of windows automation software. This is the code that I have tried, but neither of these lines are working for me:

set(#phone1, $find regular expression(#x, "\\d\\d\\d-\\d\\d\\d-\\d\\d\\d\\d ?"), "Global")
set(#phone2, $find regular expression(#x, "(\\d\\d\\d-\\d\\d\\d-\\d\\d\\d\\d)\{1\}"), "Global")

Upvotes: 0

Views: 280

Answers (2)

fronthem
fronthem

Reputation: 4139

Now, I think you know that how to identify a telephone number but the problem is how do you know which one is the first one?

The better way than saying the word "first", I may advice you to bind your telephone number pattern to the keyword "Transferor" instead, this is more semantics, thus we get a pattern

(?<=Transferor)[\s\S]*(\d{3}-\d{3}-\d{4})

Explanation

(?<=Transferor) check if there is a word "Transferor" before,

[\s\S]* any string including newline,

(\d{3}-\d{3}-\d{4}) telephone number.

Now the telephone number of Transferor is stored in $1 variable.

Note that regex above is just a standard form, please edit it according to .NET.

Upvotes: 1

Nir Alfasi
Nir Alfasi

Reputation: 53535

Since this is a multiline-string, it has newline characters, you can use that to differentiate between the first occurrence (which is in a line that end with the newline symbol: \n) and the last one:

(\d{3}-\d{3}-\d{4})(?=.*\n)

Tested in an environment that tests regex in .Net

Upvotes: 1

Related Questions