Reputation: 169
I have wrote a script to grab different fields in an HTML file and populate variables with the results. I'm having issues with the regular expression for grabbing the email. Here is some sample code:
$txt='<p class=FillText><a name="InternetMail_P3"></a>[email protected]</p>'
$re='.*?'+'([\\w-+]+(?:\\.[\\w-+]+)*@(?:[\\w-]+\\.)+[a-zA-Z]{2,7})'
if ($txt -match $re)
{
$email1=$matches[1]
write-host "$email1"
}
I get the following error:
Bad argument to operator '-match': parsing ".*?([\\w-+]+(?:\\.[\\w-+]+)*@(?:[\\w-]+\\
.)+[a-zA-Z]{2,7})([\\w-+]+(?:\\.[\\w-+]+)*@(?:[\\w-]+\\.)+[a-zA-Z]{2,7})" - [x-y] range in reverse order..
At line:7 char:16
+ if ($txt -match <<<< $re)
+ CategoryInfo : InvalidOperation: (:) [], RuntimeException
+ FullyQualifiedErrorId : BadOperatorArgument
What am I missing here? Also, is there a better regex for email?
Thanks in advance.
Upvotes: 5
Views: 12079
Reputation: 8669
Actually any regex that is suitable for .Net or C# will work for PowerShell. And you could find tons and tons samples at stackoverflow and inet. For example: How to Find or Validate an Email Address: The Official Standard: RFC 2822
$txt='<p class=FillText><a name="InternetMail_P3"></a>[email protected]</p>'
$re="[a-z0-9!#\$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#\$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?"
[regex]::MAtch($txt, $re, "IgnoreCase ")
But there is also other part of this answer. Regex by nature is not very suitable to parse XML/HTML. You could find more details here: Using regular expressions to parse HTML: why not?
To provide real solution, I'm recomment first
Upvotes: 11
Reputation: 3176
When it comes to email validation I usually choose the short version of RFC 2822 being:
[a-z0-9!#$%&'*+/=?^_
{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_
{|}~-]+)*@(?:a-z0-9?.)+a-z0-9?
You can find more info about email validation here
Upvotes: 2