AtomicPorkchop
AtomicPorkchop

Reputation: 2675

How can I make this REGEX cleaner?

I have this regex I made to compare OS names to a line in a VMX file. It started out as separate elsif statments, but I ended up making into a single if statment. Anyhow, here is the code; I am trying to find a way to make the code cleaner, but it put each match on a separate line; it no longer works.

elsif ($vmx_file =~ m/guestOSAltName\s+=\s"Microsoft\sWindows\sServer\s2003,Web\sEdition"|"Microsoft\sWindows\sSmall\sBusiness\sServer\s2003"|"Microsoft\sWindows\s2000\sAdvanced\sServer"|"Microsoft\sWindows\s2000\sServer"|"Microsoft\sWindows\s2000\sProfessional"|"Microsoft\sWindows\s98"|"Microsoft\sWindows\s95"|"Microsoft\sWindows\sNT\s4"/) {
            $virtual_machines{$vm}{"Architecture"} = "32-bit";

Updated code as per suggestions,

elsif ($vmx_file =~ m/guestOSAltName\s+=\s"Microsoft\sWindows\sServer\s2003,Web\sEdition|Small\sBusiness\sServer\s2003|"2000\sAdvanced\sServer|2000\sServer|2000\sProfessional|98|95|NT\s4/) {
            $virtual_machines{$vm}{"Architecture"} = "32-bit";

Upvotes: 0

Views: 163

Answers (5)

mob
mob

Reputation: 118605

You can use the /x modifier to make your regex prettier, if not actually cleaner.

$vmx_file =~ m/guestOSAltName\s+=
    \s("Microsoft\sWindows\sServer\s2003,Web\sEdition"
     | "Microsoft\sWindows\sSmall\sBusiness\sServer\s2003"
     | "Microsoft\sWindows\s2000\sAdvanced\sServer"
     | "Microsoft\sWindows\s2000\sServer"
     | "Microsoft\sWindows\s2000\sProfessional"
     | "Microsoft\sWindows\s98"
     | "Microsoft\sWindows\s95"
     | "Microsoft\sWindows\sNT\s4")/x

When you look at it like this, the improvements Robokop and Konstantin Gredeskoul suggested become obvious:

$vmx_file =~ m/guestOSAltName\s+=
    \s"Microsoft\sWindows\s
     (     Server\s2003,Web\sEdition
         | Small\sBusiness\sServer\s2003
         | 2000\s( (Advanced\s)?Server | Professional )
         | 9[85]
         | NT\s4
     )
       "/x

Upvotes: 8

Mark Tozzi
Mark Tozzi

Reputation: 10903

You might not want a regex at all, for efficiency as well as clarity reasons. One option would be to capture the string outside the if block and match by hash keys:

#this could be offloaded to a constants file or some such
%architecture_by_os = (
    "Microsoft Windows Server 2003,Web Edition" => "32-bit",
    "Microsoft Windows Small Business Server 2003" => "32-bit",
    #etc.
)

$vmx_file =~ m/guestOSAltName\s+=\s(.*)/;
$virtual_machine{$vm}{Architecture} = $architecture_by_os{$1};

Upvotes: 5

Eugene Yarmash
Eugene Yarmash

Reputation: 149756

You can use interpolation in the pattern to make it more readable:

my $names = join '|', @names;
if ($vmx_file =~ m/guestOSAltName\s+=\s(?:$names)) {
    $virtual_machines{$vm}{Architecture} = "32-bit";
}

Upvotes: 2

kigster
kigster

Reputation: 620

You can use parenthesis to group similar items in a regex so you don't have to repeat "Microsoft\sWindows" every time. You should also use ? to indicate an optional or possibly missing item such as (\sWeb\sEdition)?.

(line breaks are for clarity)

m/guestOSAltName\s+=\s"Microsoft\sWindows\s?(Server\s2003(\sWeb\sEdition)?|
Small\sBusiness\sServer\s2003|
2000\sAdvanced\sServer|
2000\sServer|
2000\sProfessional
98|
NT\s4)"/

Hope this helps.

Upvotes: 1

Robokop
Robokop

Reputation: 926

You can start by doing something with the Microsoft Windows matching and then the rest like:

Microsoft\sWindows\s(Server\s2003,Web\sEdition|Small\SBussines...)

Upvotes: 4

Related Questions