Wimpy
Wimpy

Reputation: 33

PowerShell Parsing Name and Version out of a String

I want to parse the name and the version of a string.

The schema of the string is the following:

EntityFramework.6.2.0

EntityFramework.Functions.1.4.1

What I want to have is an array or an object with the name of the package and the version.

The version number can have 1,2,3 or 4 digits and the name can also have "."

$version = @()
$name = @()

"EntityFramework.Functions.1.4.1".Split('.') | % {
  if ($_ -match "^\d+$"){
   $version += $_
  }else{
    $name += $_
  }
}

$name -join "."
$version -join "."

This works but I think there is a better way to do it.

Any Idea to short this snippet or make it more smart.

Upvotes: 3

Views: 3163

Answers (3)

mklement0
mklement0

Reputation: 437988

Note: This is an optimized variation of the original answer below, courtesy of TheIncorrigible1.

By using the -split operator with a separator regex that uses lookaround assertions, it is possible to split the string in the desired location with a single operation:

# Stores 'EntityFramework.Functions' in $name
# and '1.4.1' in $version
$name, $version = "EntityFramework.Functions.1.4.1" -split '(?<=[^\d])\.(?=\d)'
  • (?<=[^\d])\.(?=\d) uses a look-behind assertion ((?<=...) and a look-ahead assertion ((?=...)) to provide the desired context for matching the literal . (\.):

    • The regex matches the . only if preceded by a character that is not a digit ([^\d]) and is followed by a digit, which is where we want to split: between the end of the package name and the start of the version number.

    • Regex assertions in general do not capture characters, so that even though the surrounding character are looked at, it is only the . that is considered the separator, ensuring that the tokens on either side of it are returned in full.

  • The result of the -split operation is a 2-element array, whose elements can be assigned to indidvidual variables via a destructuring assignment ($name, $version = ...)


Original answer:

Note: While the regex used below is slightly shorter than the one above, its interplay with -split is actually conceptually more complex, and the solution requires an additional operation to filter out an empty result element (-ne '').

A more concise solution that uses the -split operator with a regex (regular expression):

# Stores 'EntityFramework.Functions' in $name
# and '1.4.1' in $version
$name, $version = "EntityFramework.Functions.1.4.1" -split '^([^\d]+)\.' -ne ''
  • ^([^\d]+)\. starts matching at the start of the string (^) and matches one or more (+) non-digit characters ([^\d]) followed by a literal . (\.)

    • This matches EntityFramework.Functions., but, due to enclosing only the part before the trailing . in (...) to form a capture group, only EntityFramework.Functions is returned.
      (By default, what the separator regex matches is not returned - after all, you just want the tokens between the separators - but a capture group embedded in the regex can be used to deliberate include part of the separator in the result array).

    • The separator regex is by definition not found again in the input string (because it is anchored at the start of the string with ^, so the remainder of the string - 1.4.1 - is considered the 2nd and only remaining token.

  • -ne '' filters out the empty first element of the resulting array that is a side effect of the string starting with a match of the separator regex expression.

    • In the typical case your regex matches just the separators and doesn't include them (or parts of them) in the result array; thus, unless the input truly starts with a separator instance or you have adjacent separators, you won't get empty elements; e.g., 'foo,bar;baz' -split '[,;]' yields 'foo', 'bar', 'baz', without empty elements.

Upvotes: 2

Maximilian Burszley
Maximilian Burszley

Reputation: 19664

This can be improved by just relying on regex from the start:

$null = 'EntityFramework.Functions.1.4.1' -match '(?<name>[^\d]+)(?<version>\d.+)'
$name, $version = $Matches['name'].TrimEnd('.'), [version]$Matches['version']

$name
>> EntityFramework.Functions

$version
>> Major  Minor  Build  Revision
>> -----  -----  -----  --------
>> 1      4      1      -1

Explained:

(           // Capture a group  
  ?<name>   // Name it "name"
    [^\d]+  // Capture until you find a digit
)           // End capture group

(             // Capture a group
  ?<version>  // Name it "version"
    \d.+      // Start at a digit and wildcard catch everything after
)             // End capture group

Shortened (for haxxorz):

if ('EntityFramework.Functions.1.4.1' -match '(.*?(?=\.\d))\.(.+)')
{
    $name, [version]$version = $matches[1, 2]
}

(gottagoshort):

$name,$version='EntityFramework.Functions.1.4.1'-split'(?<=[^\d])\.(?=\d)'

Upvotes: 4

Jacob Colvin
Jacob Colvin

Reputation: 2835

@(
'EntityFramework.6.2.0',
'EntityFramework.Functions.1.4.1'
) | %{

    [pscustomobject]@{
        name = $_ -replace '\.([0-9]).*([0-9])$'
        version = $_ -replace '^([A-Za-z]).*([A-Za-z])\.'
    }
}

This separates each item based on a group of character types.

Upvotes: 0

Related Questions