Reputation: 215
I have an input file with the following syntax:
00000 INFO [IVS ] reset receiver
00000 INFO [IVS ] reset transmitter
00331 INFO [IVS ] sync detected
Data Required in the form
frame=0000
info=INFO
TYPE=[IVS ]
message=reset receiver
($frame,$info,$type,$message)=split(what would be the argument?);
note: space after IVS before bracket, so cant use space as the separator.
Upvotes: 0
Views: 399
Reputation: 3601
I agree with @hobbs but you should use the expanded format for complex regular expressions:
while( my $line = <DATA> ){
chomp $line;
my ( $frame, $info, $type, $message ) =
$line =~ m{
\A # start at the beginning of the string
(\d+) # capture a string of digits --> $frame
\s+ # skip the white space
(\S+) # capture a string of non-spaces --> $info
\s+ # skip the white space
( # start a capture --> $type
\[ # capture an opening bracket
[^\]]* # capture everything that's not a closing bracket
\] # capture the closing bracket
) # end the capture
\s+ # skip the white space
(.*) # capture the remainder of the line --> $message
}msx;
print "\$frame = $frame\n";
print "\$info = $info\n";
print "\$type = $type\n";
print "\$message = $message\n";
print "\n";
}
__DATA__
00000 INFO [IVS ] reset receiver
00000 INFO [IVS ] reset transmitter
00331 INFO [IVS ] sync detected
Upvotes: 2
Reputation: 106385
I love regexes, but... TIMTOWTDI as well. )
while (<DATA>) {
printf "frame=%s\ninfo=%s\nTYPE=%s\nmessage=%s\n",
unpack("A6 A6 A7 A*", $_);
}
__DATA__
00000 INFO [IVS ] reset receiver
00000 INFO [IVS ] reset transmitter
00331 INFO [IVS ] sync detected
Seriously, though, the point is that it might be better to split your data-string with one simple unpack
(yes, unpack is simple, it just needs a bit of practicing... )) than with some twisted regexes - of course, if all data columns have fixed width. But sometimes that's just the case. )
Upvotes: 3
Reputation: 35790
You want to split on space, as long as the space isn't followed by a ]
. This means you want to use a negative lookahead in your regular expression. Don't forget that split()
can take a regular expression as its first argument. It can also take the number of fields it returns, so if you do:
my ($frame, $info, $type, $message) = split(/\s+(?!])/, $line, 4);
...then you'll get out what you want.
This split()
splits on one or more whitespace characters that aren't followed by a ]
. It also returns four fields, so you won't split up your $message
field (everything after the third split will just end up in $message
).
Upvotes: 2
Reputation: 239781
Wrong question. You don't want to use split. The rule of thumb is: use a regex match when you know what your data looks like; use split when you know what your delimiters look like.
my ($frame, $info, $type, $message) =
$data =~ /(\d+) (\S+)\s+\[(\S+)\s*\] (.*)/;
would be a pretty good start.
Upvotes: 7