Reputation: 225
I am hoping to perform a series of edits to a large text file composed almost entirely of single letters, seperated by spaces. The file is about 300 rows by about 400,000 columns, and about 250 MB.
My goal is to tranform this table using a series of steps, for eventual processing with another language (R, probably). I don't have much experience working with big data files, but PERL has been suggested to me as the best way to go about this. Please let me know if there is a better way :).
So, I am hoping to write a PERL script that does the following:
replace each character pair according to sequential conditional algorithm running accross each row:
[example PSEUDOCODE: if character 1 of cell = character 2 of cell=a, cell=1
else if character 1 of cell = character 2 of cell=b, cell=2
etc.] such that except for the first column, the table is a numerical matrix
remove every nth column, or keep every nth column and remove all others
I am just starting to learn PERL, so I was wondering if these operations were possible in PERL, whether PERL would be the best way to do them, and if there were any suggestions for syntax on these operations in the context of reading/writing to a file.
Upvotes: 2
Views: 172
Reputation: 3484
I'll start:
use strict;
use warnings;
my @transformed;
while (<>) {
chomp;
my @cols = split(/\s/); # split on whitespace
splice(@cols, 1,6); # remove columns
push @transformed, $cols[0];
for (my $i = 1; $i < @cols; $i += 2) {
push @transformed, "$cols[$i]$cols[$i+1]";
}
# other transforms as required
print join(' ', @transformed), "\n";
}
That should get you on your way.
Upvotes: 1
Reputation: 204477
You need to post some sample input and expected output or we're just guessing what you want but maybe this will be a start:
awk '{
printf "%s ", $1
for (i=7;i<=NF;i+=2) {
printf "%s%s ", $i, $(i+1)
}
print ""
}' file
Upvotes: 0