Remko
Remko

Reputation: 7330

Extract version from string with Regular Expression and lookahead

I have the following string as an example, it is expected that version numbers will change in the future and it's possible that order changes or other types are added)

Flash Player 12.0.0.38 (Internet Explorer); 12.0.0.43 (Plugin-based browsers)

I want to parse this with RegEx in the following manner:

My tries so far:

$text = "Flash Player 12.0.0.38 (Internet Explorer); 12.0.0.43 (Plugin-based browsers)"
[RegEx]::Match($text, ".*(?=\(Internet\040Explorer\))").Value

That returns Flash Player 12.0.0.38

So I think I need to filter for "one or more words" and not capture those, then capture "one or more digits or ." when followed by (Internet Explorer) I tried:

[RegEx]::Match($text, "(?:\w+)[\d\.]+(?=\(Internet\040Explorer\))").Value

But that does not match, is the order incorrect? So I am looking for the correct regex with a small explanation.

Upvotes: 1

Views: 2715

Answers (5)

user189198
user189198

Reputation:

$text = "Flash Player 12.0.0.38 (Internet Explorer); 12.0.0.43 (Plugin-based browsers)"
$Regex = '[\d\.]+';
$Matches = [Regex]::Matches($text, $Regex);

$Matches[0].Value;
$Matches[1].Value;

Result looks like:

12.0.0.38
12.0.0.43

EDIT: I have modified the regex to match, regardless of the order.

Clear-Host;

$matches = $null;
$Regex = '(?<=Flash Player\s)(?<FlashIE>[\d\.]+)(?:.*?)(?<FlashPlugin>[\d\.]+)(?=\s\(Plu)|(?<FlashPlugin>[\d\.]+)(?=\s\(Plu)(?:.*?)(?<=Flash Player\s)(?<FlashIE>[\d\.]+)';

# 1. Example string
$text = "Flash Player 12.0.0.38 (Internet Explorer); 12.0.0.43 (Plugin-based browsers)"
$MatchList = [Regex]::Matches($text, $Regex);
$MatchList[0].Groups['FlashIE'].Value;
$MatchList[0].Groups['FlashPlugin'].Value;

# 2. Reversed example string
$text = "; 12.0.0.43 (Plugin-based browsers);Flash Player 12.0.0.38 (Internet Explorer)"
$MatchList = [Regex]::Matches($text, $Regex);
$MatchList[0].Groups['FlashIE'].Value;
$MatchList[0].Groups['FlashPlugin'].Value;

# NOTE: Both of these yield the exact, same output, because we are using named groups.

Result:

12.0.0.38
12.0.0.43
12.0.0.38
12.0.0.43

Upvotes: 1

Gooseman
Gooseman

Reputation: 2231

A little improvement with named groups:

if ($text -cmatch '(?<plugin>(?:\d\.?)+) (?=\(Plugin-based browsers\))|(?<internet>(?:\d\.?)+) (?=\(Internet Explorer\))') {
    $PluginBrowsers = $matches['plugin']
    $InternetExplorer = $matches['internet']
}

You could try this (no tested):

Using two regular expressions:

if ($subject -cmatch '((?:\d\.?)+) (?=\(Internet Explorer\))') {
   $result = $matches[1]
} else {
   $result = ''
}

if ($subject -cmatch '((?:\d\.?)+) (?=\(Plugin-based browsers\))') {
   $result = $matches[1]
} else {
   $result = ''
}

Or just one:

if ($subject -cmatch '((?:\d\.?)+) ((?=\(Plugin-based browsers\))|(?=\(Internet Explorer\)))') {
   $result = $matches[1]
} else {
   $result = ''
}

Upvotes: 1

JonM
JonM

Reputation: 1374

The reason (?:\w+)[\d\.]+(?=\(Internet\040Explorer\)) doesn't match is because the [\d\.]+(?=\(Internet\040Explorer\)) part isn't expecting a space between the version number and (Internet Explorer)

This expression will capture both of the required values no matter the order:

(?=^.*?([\d\.]+)(?=(?> *)(?:\(Internet\040Explorer\))))(?=^.*?([\d\.]+)(?=(?> *)(?:\(Plugin-based browsers\))))

In powershell:

$text = "Flash Player 12.0.0.38 (Internet Explorer); 12.0.0.43 (Plugin-based browsers)"
$regex = "(?=^.*?([\d\.]+)(?=(?> *)(?:\(Internet\040Explorer\))))(?=^.*?([\d\.]+)(?=(?> *)(?:\(Plugin-based browsers\))))"
$matches  = [RegEx]::Match($text, $regex)
echo $Matches.Groups[1].value  #Outputs 12.0.0.38
echo $Matches.Groups[2].value  #Outputs 12.0.0.43

Example order:

Flash Player 12.0.0.38 (Internet Explorer); 12.0.0.43 (Plugin-based browsers)
[Match number 1]
Matched: '' at character 1
[Capture Group 1] '12.0.0.38' found at character 14
[Capture Group 2] '12.0.0.43' found at character 45

Reverse example order:

12.0.0.43 (Plugin-based browsers); Flash Player 12.0.0.38 (Internet Explorer)
[Match number 1]
Matched: '' at character 1
[Capture Group 1] '12.0.0.38' found at character 49
[Capture Group 2] '12.0.0.43' found at character 1

Upvotes: 1

Alex Filipovici
Alex Filipovici

Reputation: 32551

Try this:

([0-9.]*)(?: (\(Internet Explorer\)|\(Plugin-based browsers\)))

Regular expression visualization

Debuggex Demo

Test it here.

Upvotes: 1

mjolinor
mjolinor

Reputation: 68263

If you're not sure what order they might be in:

$text = "Flash Player 12.0.0.38 (Internet Explorer); 12.0.0.43 (Plugin-based browsers)"

$IE_Version = $text -replace '.+\s([0-9.]+)\s\(Internet Explorer\).*','$1'
$Plugin_Version = $text -replace '.+\s([0-9.]+)\s\(Plugin-based browsers\).*','$1'

$IE_Version
$Plugin_Version

12.0.0.38
12.0.0.43

Briefly, the regex logic is:

Search the string until you find a space followed by a wad of digits and dots, followed by another space and then the literal string (Internet Explorer). Capture the wad of digits and dots, and replace the entire string with just that capture. Repeat with the literal string (Plugin-based browsers) .

Upvotes: 1

Related Questions