user2837756
user2837756

Reputation: 39

Perl file parser for dynamic file

I'm new with Perl and could really use some help making a file parser. The file is built up like this (X is a number that changes from file to file and provides the number of following lines that contains a column heading):

X,1,0,0,2,0,0,2,0,1,2,0,2,2,0,3,2,0,4,2,1,0,2,2,0,2,3,0,2,4,0,2,4,1,2,4,2,2,4,3,2,5,0,2,5,1,2,5,2,2,5,3,3,1,0,3
# Col_heading1
# Col_heading2
# Col_heading3 //Continues X rows
# Col_headingX 
# 2013 138 22:42:21 - Random text
# 2013 138 22:42:22 : Random text
# 2013 138 22:42:23 : Random text
2013 138 22:42:26, 10, 10, 10, 20, //continues X values
2013 138 22:42:27, 10, 10, 10, 20, 
2013 138 22:42:28, 10, 10, 10, 20, 
# 2013 138 22:42:31 - Random text
# 2013 138 22:42:32 : Random text
# 2013 138 22:42:33 - Event $eventname starting ($eventid) //$eventname and $eventid changes for each file
2013 138 22:42:35, 10, 10, 10, 20, 
2013 138 22:42:36, 10, 10, 10, 20, 
2013 138 22:42:37, 10, 10, 10, 20, 
2013 138 22:42:38, 10, 10, 10, 20, 
2013 138 22:42:39, 10, 10, 10, 20, 
# 2013 138 22:42:40 : Random text
2013 138 22:42:41, 10, 10, 10, 20, 
2013 138 22:42:42, 10, 10, 10, 20, 
# 2013 138 22:42:45 - Event $eventname ended ($eventid) //$eventname and $eventid changes for each file
2013 138 22:42:46, 10, 10, 10, 20, 
2013 138 22:42:47, 10, 10, 10, 20, 
# 2013 138 22:42:48 : Random text

The parser needs to transpose Col_headings to tab separated values on one line, and list all lines between # 2013 138 22:42:33 - Event $eventname starting ($eventid) and # 2013 138 22:42:45 - Event $eventname ended ($eventid) that does not start with a #. The values must also be changed from comma separated to tab separated.

The output file should then look like:

Filename:/home/..../filename    What:$eventname Where:SYSTEM    ID:$eventid
Time                Col_heading1    Col_heading2    Col_heading3    Col_headingX
2013 138 22:42:35   10              10              10              20
2013 138 22:42:36   10              10              10              20
2013 138 22:42:37   10              10              10              20
2013 138 22:42:38   10              10              10              20
2013 138 22:42:39   10              10              10              20 
2013 138 22:42:41   10              10              10              20 
2013 138 22:42:42   10              10              10              20

Any help with this would be very much appreciated!

Upvotes: 0

Views: 153

Answers (1)

RobEarl
RobEarl

Reputation: 7912

Once you've opened the file you can get the number from the first line with:

my ($heading_count) = split /,/, <$fh>;

Then loop to get the headings:

my @headings = qw(Time);
for (1..$heading_count) {
    chomp(my $heading = <$fh>); # Chomp to remove the newline
    # Process it somehow, e.g. remove leading # + whitespace
    $heading =~ s/^#\s+//;
    push @headings, $heading;
}

Once you've done that, loop through the rest of the file, parsing and printing any rows between the start/end patterns. Here is a fairly simplistic example to get you started:

print join "\t", @headings, "\n"; # print out the headings
my $in_event = 0; # State variable to track if we're in an event
while(<DATA>) {
    if (/Event (.*) starting \((.*)\)/) { # Watch for the event starting, event name is now in $1, event id in $2
        $in_event = 1;
        next;
    }
    next unless $in_event; # Skip if not in an event yet
    last if /Event .* ended/; # Stop reading if the event ends
    next if /^#/; # Skip comments

    s/,\s?/\t/g; # Replace commas with tabs
    print; # Print the row
}

You'll find using this approach the column headings don't line up properly with the data due to the variable lengths so you'll either need to tweak it to get exactly what is required or look into Text::CSV for parsing the rows (or use split) and something like Text::Table to produce a proper table.

Upvotes: 1

Related Questions