Reputation: 3669
Have few thousand reports that have consistently formatted tabular data embedded within them that I need to extract.
Have a few ideas, but thought I'd post to see if there's a better way to do this than what I'm thinking; which is to extract the tabular data, create a new file for it, then parse that data as a tabular file.
Here's a sample input and output, where the output read and written row by row to a database.
INPUT_FILE
MiscText MiscText MiscText
MiscText MiscText MiscText
MiscText MiscText MiscText
SubHeader
PASS 1283019238 alksdjalskdjl
FAIL 102310928301 kajdlkajsldkaj
PASS 102930192830 aoisdajsdoiaj
PASS 192830192301 jiasdojoasi
MiscText MiscText MiscText
MiscText MiscText MiscText
MiscText MiscText MiscText
OUTPUT (read/write row-by-row from text-file to DB)
ROW-01{column01,column02,column03}
...
ROW-nth{column01,column02,column03}
Upvotes: 0
Views: 260
Reputation: 8895
In case this is a fixed width data, I would strongly suggest using unpack
or plain old substr
.
Upvotes: 0
Reputation: 15294
Recognizing when to start processing tabular data is easy. You've got the marker line. The difficulty is recognizing when to stop processing data. You can apply the heuristics of stopping to process data when the split
doesn't yield the expected result.
use strict;
use warnings;
my $tab_data;
my $num_cols;
while ( <> ) {
$tab_data = 1, next if $_ eq "SubHeader\n";
next unless $tab_data;
chomp;
my @cols = split /\t/;
$num_cols ||= scalar @cols;
last if $num_cols and $num_cols != scalar @cols;
print join( "\t", @cols ), "\n";
}
Save as etd.pl
(etd = extract tabular data, what did you think?), and call it like this from the command line:
perl etd.pl < your-mixed-input.txt
Upvotes: 2
Reputation: 24574
If you know how to extract data, why create a new file instead of processing it immediately?
Upvotes: 1