Reputation: 3
I have strings where I have to parse a name and a version number as separate fields. Version number may include alphabets as well.
Example Strings:
AntivirusOwner10.5.6.R01.Vr561
Antivirus2010Owner10.5.6.R01.Vr561
Antivirus_abc Movbsd 2008 abc r6 10.20.161.17
Antivirus_abc Movbsd .abc 4.5.6.7
Antivirus_abc Movbsd .mnc 4
Expected separation:
AntivirusOwner 10.5.6.R01.Vr561 Antivirus2010Owner 10.5.6.R01.Vr561 Antivirus_abc Movbsd 2008 abc r6 10.20.161.1 Antivirus_abc Movbsd .abc 4.5.6.7 Antivirus_abc Movbsd .mnc 4
Upvotes: 0
Views: 376
Reputation: 13432
Based on your example strings, I would say, we assume that the package name ends before there is a number followed by a .
. A regex for this would look like in this example:
$packageDescriptions = "AntivirusOwner10.5.6.R01.Vr561", "Antivirus2010Owner10.5.6.R01.Vr561", "Antivirus_IIS .Net10.12.14.16", "Antivirus_abc Movbsd 2008 abc r6 10.20.161.17", "Antivirus_abc Movbsd .abc 4.5.6.7", "Antivirus_abc Movbsd .mnc 4"
foreach ($packageDescription in $packageDescriptions) {
if ($packageDescription -imatch "^(.*?)(\d+\.[\w\.]*|\d+)$") {
Select-Object @{n='PackageName'; e={$Matches[1]}}, @{n='PackageVersion'; e={$Matches[2]}} -InputObject ''
} else {
Write-Warning "'$packageDescription' is not covered by this regex!"
}
}
Output:
PackageName PackageVersion ----------- -------------- AntivirusOwner 10.5.6.R01.Vr561 Antivirus2010Owner 10.5.6.R01.Vr561 Antivirus_IIS .Net 10.12.14.16 Antivirus_abc Movbsd 2008 abc r6 10.20.161.17 Antivirus_abc Movbsd .abc 4.5.6.7 Antivirus_abc Movbsd .mnc 4
Explanation of the regex "^(.*?)(\d+\.[\w\.]*|\d+)$"
:
It has two groups encapsulated by ()
. First will be the name. It matches on everything, but in an ungreedy way (see addition of ?
) so that group 2 will take precedence. Group 2 (version) says it has to start with at least one digit followed by a dot followed by alphanumeric characters and dots OR just some pure digits to catch the case where version only consists of 4
(without dots).
Upvotes: 3