Reputation: 3615
I am trying to create a regex that will grab the first occurrence of a phone number from this string:
<font color="#848484">Transferor(s)</font>
<br />Harzuz Holdings Ltd<br />Ontario Potato Inc 905-791-7735<br />
<em>Clark Packaging Products Inc</em>
<p>
</p>Pres: Jay Burstein<br />8 Tracey Blvd, Unit 2<br />Brampton, Ontario<br />L6T 5R9<p>
</p>
<font color="#848484">Transferee(s)</font>
<br />2470347 Ontario Inc 416-223-4403<br />
The phone numbers are always consistently formatted like this: 999-999-9999
. The problem I am having is that my regex is grabbing both phone numbers from my string when I only want to grab the first one. This is what I have tried so far:
(\d\d\d-\d\d\d-\d\d\d\d) ?
returns multiple phone numbers
(\d\d\d-\d\d\d-\d\d\d\d) {1}
also returns multiple phone numbers
What regex can I use to select the first phone number? And what regex can I use to select the second phone number?
I am using uBot which is a type of windows automation software. This is the code that I have tried, but neither of these lines are working for me:
set(#phone1, $find regular expression(#x, "\\d\\d\\d-\\d\\d\\d-\\d\\d\\d\\d ?"), "Global")
set(#phone2, $find regular expression(#x, "(\\d\\d\\d-\\d\\d\\d-\\d\\d\\d\\d)\{1\}"), "Global")
Upvotes: 0
Views: 280
Reputation: 4139
Now, I think you know that how to identify a telephone number but the problem is how do you know which one is the first one?
The better way than saying the word "first", I may advice you to bind your telephone number pattern to the keyword "Transferor" instead, this is more semantics, thus we get a pattern
(?<=Transferor)[\s\S]*(\d{3}-\d{3}-\d{4})
Explanation
(?<=Transferor)
check if there is a word "Transferor" before,
[\s\S]*
any string including newline,
(\d{3}-\d{3}-\d{4})
telephone number.
Now the telephone number of Transferor is stored in $1
variable.
Note that regex above is just a standard form, please edit it according to .NET.
Upvotes: 1
Reputation: 53535
Since this is a multiline-string, it has newline characters, you can use that to differentiate between the first occurrence (which is in a line that end with the newline symbol: \n
) and the last one:
(\d{3}-\d{3}-\d{4})(?=.*\n)
Tested in an environment that tests regex in .Net
Upvotes: 1