Reputation: 2675
I am trying to use regex to filter out everything but a very specific line of text from a VMware VMX file which I am running through a foreach loop because there are multiples of the file for each VM. Each time the loop runs though it binds the output of Net::OpenSSH which is running cat against the file sitting on the VM server to a scalar variable.
I am not sure if that actually made any sense.
Anyhow the problem I am running into is when the script runs it is not matching to anything in my regex expression it is just displaying all of the cated VMX files one after another. I can't figure out what I am missing.
Here is the sample of code of I am working on.
sub get_virtual_machines {
my $esx_host = config_file()->{ESX}{host};
my $ssh_port = config_file()->{ESX}{port};
my $esx_user = config_file()->{ESX}{user};
my $esx_password = config_file()->{ESX}{password};
my %options = (
port => $ssh_port,
user => $esx_user,
password => $esx_password
);
my $ssh1 = Net::OpenSSH->new($esx_host, %options);
print color 'blue';
print "Collecting virtual machine data for $esx_host\n";
my @virtual_machines = $ssh1->capture('vim-cmd vmsvc/getallvms');
shift @virtual_machines;
print color 'reset';
# Filter data from ESX\ESXi output
my %virtual_machines = ();
foreach my $vm (@virtual_machines) {
# Replace "[" with "/"
$vm =~ s/\[/\//;
# Replace "]" with "/"
$vm =~ s/\]/\//;
# Match ID, NAME and VMX location
$vm =~ m/^(\d+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\D+)(\D)(\d)(\d)/x;
# Build hash table of discovered virtual machines
$virtual_machines{"$2"}{"ID"} = "$1";
$virtual_machines{"$2"}{"VMX"} = "/vmfs/volumes$3$4";
$virtual_machines{"$2"}{"Version"} = "$9";
}
undef @virtual_machines;
foreach my $vm (keys %virtual_machines) {
$vm = $ssh1->capture("cat $virtual_machines{$vm}{VMX}");
$vm =~ m/^(\bguestOSAltName\b)/x;
print "$1\n";
}
#print Dumper (\%virtual_machines);
}
The part in question is after the "undef @virtual_machines" line. Line 38 in the sample My first goal is to match the line with the word "guestOSAltName" I think once I get that part done I will be on my way again, just hit a road block.
Here is a sample VMX file to look at too.
.encoding = "UTF-8"
config.version = "8"
virtualHW.version = "7"
pciBridge0.present = "TRUE"
pciBridge4.present = "TRUE"
pciBridge4.virtualDev = "pcieRootPort"
pciBridge4.functions = "8"
pciBridge5.present = "TRUE"
pciBridge5.virtualDev = "pcieRootPort"
pciBridge5.functions = "8"
pciBridge6.present = "TRUE"
pciBridge6.virtualDev = "pcieRootPort"
pciBridge6.functions = "8"
pciBridge7.present = "TRUE"
pciBridge7.virtualDev = "pcieRootPort"
pciBridge7.functions = "8"
vmci0.present = "TRUE"
nvram = "NS02.nvram"
deploymentPlatform = "windows"
virtualHW.productCompatibility = "hosted"
unity.customColor = "|23C0C0C0"
tools.upgrade.policy = "useGlobal"
powerType.powerOff = "default"
powerType.powerOn = "default"
powerType.suspend = "default"
powerType.reset = "default"
displayName = "NS02"
extendedConfigFile = "NS02.vmxf"
scsi0.present = "TRUE"
scsi0.sharedBus = "none"
scsi0.virtualDev = "lsilogic"
memsize = "512"
scsi0:0.present = "TRUE"
scsi0:0.fileName = "NS02.vmdk"
scsi0:0.deviceType = "scsi-hardDisk"
ide1:0.present = "TRUE"
ide1:0.clientDevice = "FALSE"
ide1:0.deviceType = "cdrom-image"
ide1:0.startConnected = "FALSE"
ethernet0.present = "TRUE"
ethernet0.virtualDev = "e1000"
ethernet0.networkName = "solignis.local"
ethernet0.addressType = "generated"
chipset.onlineStandby = "FALSE"
guestOSAltName = "Ubuntu Linux (64-bit)"
guestOS = "ubuntu-64"
uuid.location = "56 4d ab a6 1e 7b c5 43-02 45 7c 24 1f fc 28 d9"
uuid.bios = "56 4d ab a6 1e 7b c5 43-02 45 7c 24 1f fc 28 d9"
vc.uuid = "52 50 c1 4b be 91 07 d5-22 0e 86 ee db 88 6d 8a"
snapshot.action = "keep"
sched.cpu.min = "0"
sched.cpu.units = "mhz"
sched.cpu.shares = "normal"
sched.mem.minsize = "0"
sched.mem.shares = "normal"
sched.scsi0:0.shares = "normal"
bios.forceSetupOnce = "FALSE"
floppy0.present = "FALSE"
ethernet0.generatedAddress = "00:0c:29:fc:28:d9"
tools.syncTime = "FALSE"
cleanShutdown = "FALSE"
replay.supported = "FALSE"
sched.swap.derivedName = "/vmfs/volumes/4cbcad5b-b51efa39-c3d8-001517585013/NS02/NS02-510988a0.vswp"
scsi0:0.redo = ""
vmotion.checkpointFBSize = "4194304"
pciBridge0.pciSlotNumber = "17"
pciBridge4.pciSlotNumber = "21"
pciBridge5.pciSlotNumber = "22"
pciBridge6.pciSlotNumber = "23"
pciBridge7.pciSlotNumber = "24"
scsi0.pciSlotNumber = "16"
ethernet0.pciSlotNumber = "32"
vmci0.pciSlotNumber = "33"
ethernet0.generatedAddressOffset = "0"
vmci0.id = "536619225"
hostCPUID.0 = "0000000a756e65476c65746e49656e69"
hostCPUID.1 = "000006fb000408000000e3bdbfebfbff"
hostCPUID.80000001 = "00000000000000000000000120100800"
guestCPUID.0 = "0000000a756e65476c65746e49656e69"
guestCPUID.1 = "000006fb00010800800022010febfbff"
guestCPUID.80000001 = "00000000000000000000000120100800"
userCPUID.0 = "0000000a756e65476c65746e49656e69"
userCPUID.1 = "000006fb000408000000e3bdbfebfbff"
userCPUID.80000001 = "00000000000000000000000120100800"
evcCompatibilityMode = "FALSE"
ide1:0.fileName = "/usr/lib/vmware/isoimages/linux.iso"
Upvotes: 2
Views: 329
Reputation: 4251
As @canavanin has said, The problem is that you have a multiline text so you need to use m//m
in order to have ^
and $
meaning start and end of line (instead start and end of string). Also is better (safer) to capture the match to a variable (in perl >5.10 also you have named captures as @Potter pointed out). Finally, the m//x is very useful but only if you write your regex in several lines, in order to allow comments and forget about spaces, but in a single line is useless and is error prone because people forget about explicitly write spaces with \s
or \s+
and put real (but escaped by the x
) whitespaces.
Also as you said you wanted to print the line, not only the 'guestOSAltName'
, then you need to capture until the end of line: m/(^guestOSAltName .+$)/m
(if you add the single-line-mode to the multi-line //ms
then you would need to make the .+
non greedy .+?
to allow $
to match the end of line before it being consumed by the greedy .+
in single-line-mode)
[not working code]
$vm =~ m/^(\bguestOSAltName\b)/x;
print "$1\n";
[working code]
# make list context with parentheses
(my $guest_os_alias_line) = $vm=~m/^ # start of line (using /m)
( #start capturing
guestOSAltName
\b # just in case guestOSAltName is a substring in an unwanted line
.+ # everything else in the line (not matching \n because no /s)
) # end capturing
$ # end of line (because /m)
/xm; # multiline mode
print "$guest_os_alias_line\n";
If you have more than one of such lines, then you would like to have a multiple-matching-mode /g and capture in an array:
(my @guest_os_alias_lines) = $vm=~m/^ # start of line (using /m)
( #start capturing
guestOSAltName
\b # just in case guestOSAltName is a substring in an unwanted line
.+ # everything else in the line (not matching \n because no /s)
) # end capturing
$ # end of line (because /m)
/xmg; # multiline mode (m) and multi-matching(g)
print "@guest_os_alias_line\n"; # not needed `join ("\n",@guest_os_alias_line)` because the lines contain the `\n` already
Upvotes: 0
Reputation: 1174
If I guess right at what you want, it's probably something like
if( $vm =~ /^guestOSAltName = (.+)\n/ )
{
print "$1\n";
}
Upvotes: 1
Reputation: 9135
It's hard to say with the information you've given, but I think the problem is that the regex
$vm =~ m/^(\bguestOSAltName\b)/x;
doesn't match the file you've given, because the ^
assertion matches start-of-string, not start-of-line. Since the regex doesn't match, $1
keeps its old value from earlier in the program, which gets printed out. For safety, you should check the regex actually matched before using captures:
if ($vm =~ m/^(\bguestOSAltName\b)/x) {
print "$1\n";
}
else {
carp "Couldn't find guestOSAltName!";
}
Or grab the captures by putting the match in list context:
# $result gets $1 if the match succeeds, undef if it fails.
my ($result) = $vm =~ m/^(\bguestOSAltName\b)/x
To make ^
match start-of-line, you need the /m
modifier, which changes ^$
to match linewise instead of stringwise:
if ($vm =~ m/^(\bguestOSAltName\b)/xm) { ... }
This is why Damian Conway in Perl Best Practices recommends that you always use /m
-- because then ^$
always do what you intuitively think they should do. [He in fact recommends always using /xms
. You're one-third of the way there :) ]
PS: Everything from this point on is general code-review criticism, not directly related to the question. I hope it's useful, but feel free to ignore it.
I find that overuse of escaped chars in regexes and other double-quotish contexts
$vm =~ s/\[/\//;
is often better rewritten in a single-quotish context:
$vm =~ s'['/';
Furthermore, this regex is pretty hard to read:
$vm =~ m/^(\d+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\D+)(\D)(\d)(\d)/x;
You're using the /x
tag, why not take advantage of it?
$vm =~ m/^(\d+) \s+ # $1: number of some sort
(\S+) \s+ # $2: identifier we're interested in
(\S+) \s+ # $3: VMX filename part a
(\S+) \s+ # $4: VMX filename part b
(\S+) \s+ # $5: another identifier
(\D+)(\D) # $6, $7: at least two nondigits
(\d) # $8: digit
(\d) # $9: version digit
/x;
I'd also consider using named captures:
$vm =~ m/^(?: \d+) \s+ # number of some sort
(?<ID> \S+) \s+ # $+{ID}: identifier we're interested in
(?<VMXa> \S+) \s+ # $+{VMXa}: VMX filename part a
(?<VMXb> \S+) \s+ # $+{VMXb}: VMX filename part b
(?: \S+) \s+
(?:\D+)(?:\D) # at least two nondigits
(?:\d) # one digit
(?<VERSION> \d) # $+{VERSION}: version digit
/x;
Now instead of cryptic references to $2
and $9
afterward, you have clear, obvious, self-documenting references to $+{ID}
and $+{VERSION}
. I've made the rest of the groups into non-capturing groups (?:regex)
, but if I want to capture one at a later date I can make it into a named capture without changing the indices of all the other captures, unlike with positional capturing.
Named captures are also less likely to suffer from the old value problem mentioned above, where a failed capture leaves all the $1
variables in their old state.
Upvotes: 5