Reputation: 39
I'm new with Perl and could really use some help making a file parser. The file is built up like this (X is a number that changes from file to file and provides the number of following lines that contains a column heading):
X,1,0,0,2,0,0,2,0,1,2,0,2,2,0,3,2,0,4,2,1,0,2,2,0,2,3,0,2,4,0,2,4,1,2,4,2,2,4,3,2,5,0,2,5,1,2,5,2,2,5,3,3,1,0,3
# Col_heading1
# Col_heading2
# Col_heading3 //Continues X rows
# Col_headingX
# 2013 138 22:42:21 - Random text
# 2013 138 22:42:22 : Random text
# 2013 138 22:42:23 : Random text
2013 138 22:42:26, 10, 10, 10, 20, //continues X values
2013 138 22:42:27, 10, 10, 10, 20,
2013 138 22:42:28, 10, 10, 10, 20,
# 2013 138 22:42:31 - Random text
# 2013 138 22:42:32 : Random text
# 2013 138 22:42:33 - Event $eventname starting ($eventid) //$eventname and $eventid changes for each file
2013 138 22:42:35, 10, 10, 10, 20,
2013 138 22:42:36, 10, 10, 10, 20,
2013 138 22:42:37, 10, 10, 10, 20,
2013 138 22:42:38, 10, 10, 10, 20,
2013 138 22:42:39, 10, 10, 10, 20,
# 2013 138 22:42:40 : Random text
2013 138 22:42:41, 10, 10, 10, 20,
2013 138 22:42:42, 10, 10, 10, 20,
# 2013 138 22:42:45 - Event $eventname ended ($eventid) //$eventname and $eventid changes for each file
2013 138 22:42:46, 10, 10, 10, 20,
2013 138 22:42:47, 10, 10, 10, 20,
# 2013 138 22:42:48 : Random text
The parser needs to transpose Col_headings to tab separated values on one line, and list all lines between # 2013 138 22:42:33 - Event $eventname starting ($eventid)
and # 2013 138 22:42:45 - Event $eventname ended ($eventid)
that does not start with a #.
The values must also be changed from comma separated to tab separated.
The output file should then look like:
Filename:/home/..../filename What:$eventname Where:SYSTEM ID:$eventid
Time Col_heading1 Col_heading2 Col_heading3 Col_headingX
2013 138 22:42:35 10 10 10 20
2013 138 22:42:36 10 10 10 20
2013 138 22:42:37 10 10 10 20
2013 138 22:42:38 10 10 10 20
2013 138 22:42:39 10 10 10 20
2013 138 22:42:41 10 10 10 20
2013 138 22:42:42 10 10 10 20
Any help with this would be very much appreciated!
Upvotes: 0
Views: 153
Reputation: 7912
Once you've opened the file you can get the number from the first line with:
my ($heading_count) = split /,/, <$fh>;
Then loop to get the headings:
my @headings = qw(Time);
for (1..$heading_count) {
chomp(my $heading = <$fh>); # Chomp to remove the newline
# Process it somehow, e.g. remove leading # + whitespace
$heading =~ s/^#\s+//;
push @headings, $heading;
}
Once you've done that, loop through the rest of the file, parsing and printing any rows between the start/end patterns. Here is a fairly simplistic example to get you started:
print join "\t", @headings, "\n"; # print out the headings
my $in_event = 0; # State variable to track if we're in an event
while(<DATA>) {
if (/Event (.*) starting \((.*)\)/) { # Watch for the event starting, event name is now in $1, event id in $2
$in_event = 1;
next;
}
next unless $in_event; # Skip if not in an event yet
last if /Event .* ended/; # Stop reading if the event ends
next if /^#/; # Skip comments
s/,\s?/\t/g; # Replace commas with tabs
print; # Print the row
}
You'll find using this approach the column headings don't line up properly with the data due to the variable lengths so you'll either need to tweak it to get exactly what is required or look into Text::CSV
for parsing the rows (or use split
) and something like Text::Table
to produce a proper table.
Upvotes: 1