fammi
fammi

Reputation: 215

perl split function usage

I have an input file with the following syntax:

00000 INFO  [IVS ] reset receiver  
00000 INFO  [IVS ] reset transmitter  
00331 INFO  [IVS ] sync detected     

Data Required in the form

frame=0000  
info=INFO  
TYPE=[IVS ]  
message=reset receiver  

($frame,$info,$type,$message)=split(what would be the argument?);

note: space after IVS before bracket, so cant use space as the separator.

Upvotes: 0

Views: 399

Answers (4)

shawnhcorey
shawnhcorey

Reputation: 3601

I agree with @hobbs but you should use the expanded format for complex regular expressions:

while( my $line = <DATA> ){
  chomp $line;

  my ( $frame, $info, $type, $message ) = 
    $line =~ m{
      \A        # start at the beginning of the string
      (\d+)     # capture a string of digits        --> $frame
      \s+       # skip the white space
      (\S+)     # capture a string of non-spaces    --> $info
      \s+       # skip the white space
      (         # start a capture                   --> $type
        \[      #   capture an opening bracket
        [^\]]*  #   capture everything that's not a closing bracket
        \]      #   capture the closing bracket
      )         # end the capture
      \s+       # skip the white space
      (.*)      # capture the remainder of the line --> $message
    }msx;

  print "\$frame   = $frame\n";
  print "\$info    = $info\n";
  print "\$type    = $type\n";
  print "\$message = $message\n";
  print "\n";
}

__DATA__
00000 INFO  [IVS ] reset receiver
00000 INFO  [IVS ] reset transmitter
00331 INFO  [IVS ] sync detected

Upvotes: 2

raina77ow
raina77ow

Reputation: 106385

I love regexes, but... TIMTOWTDI as well. )

while (<DATA>) {
  printf "frame=%s\ninfo=%s\nTYPE=%s\nmessage=%s\n", 
    unpack("A6 A6 A7 A*", $_);
}

__DATA__
00000 INFO  [IVS ] reset receiver
00000 INFO  [IVS ] reset transmitter
00331 INFO  [IVS ] sync detected

Seriously, though, the point is that it might be better to split your data-string with one simple unpack (yes, unpack is simple, it just needs a bit of practicing... )) than with some twisted regexes - of course, if all data columns have fixed width. But sometimes that's just the case. )

Upvotes: 3

CanSpice
CanSpice

Reputation: 35790

You want to split on space, as long as the space isn't followed by a ]. This means you want to use a negative lookahead in your regular expression. Don't forget that split() can take a regular expression as its first argument. It can also take the number of fields it returns, so if you do:

my ($frame, $info, $type, $message) = split(/\s+(?!])/, $line, 4);

...then you'll get out what you want.

This split() splits on one or more whitespace characters that aren't followed by a ]. It also returns four fields, so you won't split up your $message field (everything after the third split will just end up in $message).

Upvotes: 2

hobbs
hobbs

Reputation: 239781

Wrong question. You don't want to use split. The rule of thumb is: use a regex match when you know what your data looks like; use split when you know what your delimiters look like.

my ($frame, $info, $type, $message) = 
    $data =~ /(\d+) (\S+)\s+\[(\S+)\s*\] (.*)/;

would be a pretty good start.

Upvotes: 7

Related Questions