Reputation: 311
I have a file with contents of this sort:
*** X REGION ***
|-------------------------------------------------------------------------------------------------|
| X |
| addr tag extra data |
|-------------------------------------------------------------------------------------------------|
| $A1 label_A1X | 1 |
| $A2 label_A2X | 2 |
| $A3 label_A3X | 3 |
*** Y REGION ***
|-------------------------------------------------------------------------------------------------|
| Y |
| addr tag extra data |
|-------------------------------------------------------------------------------------------------|
| $0 label_0Y | 99 |
| $1 | 98 |
I need to capture the data under 'addr' and 'tag'; separated by commas; separately for the records under 'X REGION' and 'Y REGION'. Here's what I tried:
open($fh1, "<", $memFile) or warn "Cannot open $memFile, $!"; #input file with contents as described above.
open($fh, "+<", $XFile) or warn "Cannot open $XFile, $!";
open($fh2, "+<", $YFile) or warn "Cannot open $YFile, $!";
while(my $line = <$fh1>)
{
chomp $line;
$line = $line if (/\s+\*\*\*\s+X REGION\s+\*\*\*/ .. /\s+\*\*\*\s+Y REGION\s+\*\*\*/); #Trying to get at the stuff in the X region.
if($line =~ /\s+|\s+\$(.*)\s+(.*)\s+|(.*)/)
{
$line = "$1,$2";
print $fh $line;
print $fh "\n";
}
my $lastLineNum = `tail -1 filename`;
$line = $line if (/\*\*\* Y REGION \*\*\*/ .. $lastLineNum); #Trying to get at the stuff in the Y region.
if($line =~ /\s+|\s+\$(.*)\s+(.*)\s+|(.*)/)
{
$line = "$1,$2";
print $fh2 $line;
print $fh2 "\n";
}
}
This says $1 and $2 are uninitialized. Is the regex incorrect? Else (or also) what else is?
Upvotes: 1
Views: 83
Reputation: 6272
This is a snippet of code that operates as you need (taking full advantage of the default perl implicit var $_
):
# use die instead of warn, don't go ahead if there is no file
open(my $fin, "<", $memFile) or die "Cannot open $memFile, $!";
while(<$fin>)
{
# Flip flop between X and Y regions
if (/[*]{3}\h+X REGION\h+[*]{3}/../[*]{3}\h+Y REGION\h+[*]{3}/) {
print "X: $1,$2\n" if (/.*\$(\S*)\h*(\S*)\h*[|]/)
}
# Flip flop from Y till the end, using undef no need of external tail
if (/[*]{3}\h+Y REGION\h+[*]{3}/..undef) {
print "Y: $1,$2\n" if (/.*\$(\S*)\h*(\S*)\h*[|]/)
}
}
This is the output:
X: A1,label_A1X
X: A2,label_A2X
X: A3,label_A3X
Y: 0,label_0Y
Y: 1,
Talking about your code there are many points to fix:
in your regex to select the elements between the delimiters the pipe |
needs escaping: using a backslash \|
or the char class [|]
(i prefer the latter)
\s
matches also newline (strictly \n
or carriage return \r
), don't use it as a general space plus tab \t
replacement. Use \h
(only horizontal spaces) instead
you start the regex with \s+
but in the example the first char of the table lines is always '|'
.*
matches anything till (spaces included) apart from newlines (\n
or \r
)
So a regex like .*\s+
matches the entire line plus the newline (\s
) and possible spaces in the next line too
The flip-flop perl operator ..
gives you the lines in the selected region (edge included) but one line per time as always, so also the escaped pipe form of your regex:
\s+[|]\s+\$(.*)\s+(.*)\s+[|](.*)
can't match at all see as it behaves on the text.
So i've so replaced the data extracting regex with this one:
.*\$(\S*)\h*(\S*)\h*[|]
Regex Breakout
.*\$ # matches all till a literal dollar '$'
(\S*) # Capturing group $1, matches zero or more non-space char [^\s]
# can be replaced with (\w*) if your labels matches [0-9a-zA-Z_]
\h* # Match zero or more horizontal spaces
(\S*) # Capturing group $2, as above
\h* # Match zero or more horizontal spaces
[|] # Match a literal pipe '|'
Upvotes: 3