Reputation: 327
I have a a big tab delimited file (10 gb) with 8 columns.
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8
101_#2 1 2 F0 263 248 2 1.5
102_#1 1 6 F1 766 741 1 1.0
103_#1 2 15 V1 526 501 1 0.0
103_#1 2 9 V2 103 178 1 1.3
104_#1 1 12 V3 137 112 1 1.0
105_#1 1 17 F2 766 741 1 1.0
I want to multiply values in col8 with values in col1 present after "#" (in col1) so that output should be ->
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8
101_#3 1 2 F0 263 248 2 1.5
102_#1 1 6 F1 766 741 1 1.0
103_#0 2 15 V1 526 501 1 0.0
103_#1.3 2 9 V2 103 178 1 1.3
104_#1 1 12 V3 137 112 1 1.0
105_#1 1 17 F2 766 741 1 1.0
The first row is header and I want that same in output(so no changes for 1st row).
Effort:
use strict;
use warnings;
@ARGV or die "No input file specified";
open my $fh, '<', $ARGV[0] or die "Unable to open input file: $!";
print scalar(<$fh>);
while (<$fh>) {
chomp;
}
Upvotes: 0
Views: 633
Reputation: 67900
If your data is proper csv data, I would suggest using a CSV module when parsing it. For example Text::CSV or Text::CSV_XS.
Replace the DATA and STDOUT file handles as required. The CSV options may need to be tweaked to fit your data, refer to the documentation. This is a basic usage of the module Text::CSV_XS:
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV_XS;
my $csv = Text::CSV_XS->new({
sep_char => "\t",
binary => 1,
eol => $/,
});
my $hrs = <DATA>;
print $hrs;
while (my $row = $csv->getline(*DATA)) {
$row->[0] =~ s/#\K(\d+)$/ $row->[7] * $1 /e;
$csv->print(*STDOUT, $row );
}
__DATA__
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8
101_#2 1 2 F0 263 248 2 1.5
102_#1 1 6 F1 766 741 1 1.0
103_#1 2 15 V1 526 501 1 0.0
103_#1 2 9 V2 103 178 1 1.3
104_#1 1 12 V3 137 112 1 3.0
105_#1 1 17 F2 766 741 1 23.0
Note that data above may not contain proper tabs due to StackOverflow conversion.
Upvotes: 0
Reputation: 15264
Use unpack
:
use strict;
use warnings;
no warnings 'uninitialized';
# fixed-width file, so use unpack
# offsets: 20 28 33 42 58 74 82
my $header = <>; # ignore
while ( <> ) {
# print;
my @cols = unpack 'a19 a8 a5 a9 a16 a16 a8 a*';
# print "$_\n" for @cols; exit;
s/\s+$// for @cols; # trim
# print join(', ', @cols), "\n";
my $num;
if ( 0 <= (my $idx = rindex $cols[0], '#') ) {
$num = substr $cols[0], $idx + 1;
}
else {
warn "no number after # in col1\n";
}
printf "%f * %f = %f\n", $num, $cols[7], $num * $cols[7];
}
Upvotes: 0
Reputation: 37136
In the absence of a concerted effort on the OP's part, an explanation should suffice:
-i
flag will enable in-place editing of the file. -i.bak
creates a backup$.
in a conditional to skip the header line-a
flag, which will autosplit the line on whitespaces to generate the @F
array. The -F
flag could be used to specify the split delimiter. Testing for @F
emptiness can also be employed to skip empty liness///e
construct will be useful for updating the value to what you desire-l
flag is highly recommendedSee perldoc perlrun
, perldoc perlretut
and perldoc perlop
for more information
Upvotes: 2
Reputation: 206669
Here's one way you could do it. The idea is to skip the headers, then simply split the lines into columns and extracting the information you want.
use strict;
use warnings;
# Skip header rows
print scalar(<>);
print scalar(<>);
# Process each other line
while (<>) {
# Skip empty lines
print and next if /^\s*$/;
# Split on whitespace
my @cols = split(/\s+/);
# Split the first column on '#', removing it from the column list
my ($p1, $p2) = split(/#/, shift @cols);
# Multiply and print (original whitespace replaces with tabs
print $p1, "#", $cols[6]*$p2, "\t", join("\t", @cols), "\n";
}
Upvotes: 1