scraft3613
scraft3613

Reputation: 247

How can I break apart fixed-width columns in Perl?

Programming is so new to me that I apologize for not knowing how to phrase the question.

I have a Perl script that gets a variable from an internal tool. This isn't always what it looks like, but it will always follow this pattern:

darren.local           1987    A      Sentence1
darren.local           1996    C      Sentence2
darren.local           1991    E      Sentence3
darren.local           1954    G      Sentence4
darren.local           1998    H      Sentence5

With Perl, what's the easiest way to get each of these lines into a variable by itself? Depending on what the internal tool spits out each line will always be different and there can be more than five lines. The capitalized letter in each line is what it will end up being sorted by (all As, all Cs, all Es, etc.). Should I be looking at regular expressions?

Upvotes: 10

Views: 5085

Answers (5)

Evan Carroll
Evan Carroll

Reputation: 1

Use CPAN and my module DataExtract::FixedWidth:

#!/usr/bin/env perl
use strict;
use warnings;
use DataExtract::FixedWidth;

my @rows = <DATA>;

my $defw = DataExtract::FixedWidth->new({ heuristic => \@rows, header_row => undef });

use Data::Dumper;

print Dumper $defw->parse( $_ ) for @rows;

__DATA__
darren.local           1987    A      Sentence1
darren.local           1996    C      Sentence2
darren.local           1991    E      Sentence3
darren.local           1954    G      Sentence4
darren.local           1998    H      Sentence5

It doesn't get much simpler than that.

Upvotes: -1

brian d foy
brian d foy

Reputation: 132914

I like using unpack for this sort of thing. It's fast, flexible, and reversible.

You just need to know the positions for each column, and unpack can automatically trim the extra whitespace from each column.

If you change something in one of the columns, it's easy to go back to the original format by repacking with the same format:

my $format = 'A23 A8 A7 A*';

while( <DATA> ) {
    chomp( my $line = $_ );

    my( $machine, $year, $letter, $sentence ) =
        unpack( $format, $_ );

    # save the original line too, which might be useful later
    push @grades, [ $machine, $year, $letter, $sentence, $_ ];
    }

my @sorted = sort { $a->[2] cmp $b->[2] } @grades;

foreach my $tuple ( @sorted ) {
    print $tuple->[-1];
    }

# go the other way, especially if you changed things
foreach my $tuple ( @sorted ) {
    print pack( $format, @$tuple[0..3] ), "\n";
    }

__END__
darren.local           1987    A      Sentence1
darren.local           1996    C      Sentence2
darren.local           1991    E      Sentence3
darren.local           1954    G      Sentence4
darren.local           1998    H      Sentence5

Now, there's an additional consideration. It sounds like you might have this big chunk of multi-line text in a single variable. Handle this as you would a file by opening a filehandle on a reference to the scalar. The filehandle stuff takes care of the rest:

 my $lines = '...multiline string...';

 open my($fh), '<', \ $lines;

 while( <$fh> ) {
      ... same as before ...
      }

Upvotes: 21

Jeremy Wall
Jeremy Wall

Reputation: 25275

For each line of text something like this:

my ($domain, $year, $grade, @text) = split /\s+/, $line;

I use an array for the sentence since it's not clear if the sentence at the end will have spaces or not. you can then join the @text array into a new string if necessary. If the sentences at the end are not going to have spaces then you can turn @text into $text.

Upvotes: 0

Nifle
Nifle

Reputation: 11933

use strict;
use warnings;

# this puts each line in the array @lines
my @lines = <DATA>; # <DATA> is a special filehandle that treats
                    # everything after __END__ as if it was a file
                    # It's handy for testing things

# Iterate over the array of lines and for each iteration
# put that line into the variable $line
foreach my $line (@lines) {
   # Use split to 'split' each $line with the regular expression /s+/
   # /s+/ means match one or more white spaces.
   # the 4 means that all whitespaces after the 4:th will be ignored
   # as a separator and be included in $col4
   my ($col1, $col2, $col3, $col4) = split(/\s+/, $line, 4);

   # here you can do whatever you need to with the data
   # in the columns. I just print them out
   print "$col1, $col2, $col3, $col4 \n";
}


__END__
darren.local           1987    A      Sentece1
darren.local           1996    C      Sentece2
darren.local           1991    E      Sentece3
darren.local           1954    G      Sentece4
darren.local           1998    H      Sentece5

Upvotes: 3

Richard H
Richard H

Reputation: 39135

Assuming that the text is put into a single variable $info, then you can split it into separate lines using the intrinsic perl split function:

my @lines = split("\n", $info); 

where @lines is an array of your lines. The "\n" is the regex for a newline. You can loop through each line as follows:

foreach (@lines) {
   $line = $_;
   # do something with $line....  
}

You can then split each line on whitespace (regex \s+, where the \s is one whitespace character, and the + means 1 or more times):

@fields = split("\s+", $line);

and you can then access each field directly via its array index: $field[0], $field[1] etc.

or, you can do:

($var1, $var2, $var3, $var4) = split("\s+", $line);

which will put the fields in each line into seperate named variables.

Now - if you want to sort your lines by the character in the third column, you could do this:

my @lines = split("\n", $info); 
my @arr = ();    # declare new array

foreach (@lines) {
   my @fields = split("\s+", $_);
   push(@arr, \@fields)    # add @fields REFERENCE to @arr 
}

Now you have an "array of arrays". This can easily be sorted as follows:

@sorted = sort { $a->[2] <=> $b->[2] } @arr;

which will sort @arr by the 3rd element (index 2) of @fields.

Edit 2 To put lines with the same third column into their own variables, do this:

my %hash = ();             # declare new hash

foreach $line (@arr) {     # loop through lines
  my @fields = @$line;     # deference the field array

  my $el = $fields[2];     # get our key - the character in the third column

  my $val = "";
  if (exists $hash { $el }) {         # check if key already in hash
     my $val = $hash{ $el };        # get the current value for key
     $val = $val . "\n" . $line;    # append new line to hash value         
  } else {
     $val = $line;
  }
  $hash{ $el } = $val;         # put the new value (back) into the hash
}

Now you have a hash keyed with the third column characters, with the value for each key being the lines that contain that key. You can then loop through the hash and print out or otherwise use the hash values.

Upvotes: 2

Related Questions