AlexandruC
AlexandruC

Reputation: 3637

c# regex for awkward mail and phone number

I've used the following 2 regex, that I have found here:

  1. for 10 digit phone number with space before and after it: @"(?<!\d)\d{10}(?!\d)"

  2. For email:

@"(([\w-]+\.)+[\w-]+|([a-zA-Z]{1}|[\w-]{2,}))@"
                   + @"((([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])\.([0-1]? 

[0-9]{1,2}|25[0-5]|2[0-4][0-9])." + @"([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9]).([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])){1}|" + @"([a-zA-Z]+[\w-]+.)+[a-zA-Z]{2,4})";

the email regex works fine for normal and correct email addresses,

but my input text isn't actually normal and correct, sau I have to introduce another particularity for it, I would also like it to parse:

myname@this -email.com *or* myname@this- email.com *or* myname@this - email.com

And the 10 digit number phone regex only extracts the last 9 digits of the number, with the phone number starting with 0 :

for nr = 0123456789 I only get : 123456789 , the phone number is found in text like: "some text here 0123456789 and some more here".

I've also found that the 10 digit number may be found in this form : 012/3456789

Upvotes: 0

Views: 105

Answers (1)

Robin
Robin

Reputation: 9644

  1. I don't believe you need to use look ahead to match your phone number. You could do something like " (\d[^0-9a-zA-Z]*?){9}\d " to match for numbers exactly 10 digits long, with any kind of junk between the numbers (except a-z letters). Careful though, it would also match stuff like " 012 à 3456789 ". If the single slash is the only possible separator, you can use " (\d/?){9}\d ": no false positive but you wont match unexpected separators.

  2. Regarding the email matter, the official complete regex is pretty long. Gusdor's link (http://www.regular-expressions.info/email.html) gives some context and a simple one that should match pretty much every email address actually in use. Tweaking it a little bit to match your case (this -domain.com or this- domain.org would be matched), you could use something like this:

    [a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9]*( *?- *)?[a-z0-9]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?
    

The custom part here is the ( *?- *)? added to the right place.

Upvotes: 1

Related Questions