Reputation: 155
I have two files and I would like to match column 2 from file1 with column NF from file2. If they match I would like to output the whole line from file2 with, with in addition column 5 from file1 and column 5 from file 1 multiplied with column NF-2 from file 2at the end. The files have different lenghts.
I have the following two file-types:
file1
xx name1 1 we freq1
xy name2 2 wer freq2
xz name3 3 werf freq3
file2
..... value1 cv name1
..... value4 cvb name4
..... value3 cvb name3
..... value1 vbn name5
desired output
..... value1 cv name1 freq1 (freq1*value1)
..... value3 cvb name3 freq3 (freq3*value3)
I have tried doing this using awk.
awk 'FNR==NR { two[$0]++; next } { for (i in two) { split(i, one); if (one[2] == $NF) print $0, one[5], $(NF-2)*one[5] } }' file1 file2 > file3
It works, but is super inefficient, and makes my comp crash after a while. The files are about 100,000 lines each. Maybe I should use perl or python for this? Or is there a way to read file 2 as well? Thanks!
Upvotes: 1
Views: 586
Reputation: 50637
perl -ane'
BEGIN{ open($f,shift)||die$!; %h = (%h, (split)[1,4]) while <$f> }
$fq = $h{$F[-1]} or next;
s|$| sprintf(" %s (%s)", $fq, "$fq*$F[-3]") |e;
print;
' file1 file2 > file3
Expressed as a standalone script:
#!/usr/bin/perl
use strict; use warnings;
my %frequencies;
my $frequency_file = shift @ARGV;
open my $fh, "<", $frequence_file or die "Can't open $frequency_file: $!";
while (<$fh>) {
my (undef, $name, undef, undef, $freq) = split;
$frequencies{$name} = $freq;
}
while (<>) {
my @fields = split;
my ($value, $name) = @fields[-3, -1];
next if not exists $frequencies{$name};
my $freq = $frequencies{$name};
print join(" ", @fields, $freq, "($freq * $value)"), "\n";
}
output
..... value1 cv name1 freq1 (freq1*value1)
..... value3 cvb name3 freq3 (freq3*value3)
Remove double qoutes for "$fq*$F[-3]"
to do actual multiplication.
Upvotes: 2
Reputation: 203368
All you need is:
awk 'NR==FNR{a[$2]=$NF;next} $NF in a{print $0, a[$NF], a[$NF] * $(NF-2)}' file1 file2
e.g.:
$ cat file1
xx name1 1 we 3
xy name2 2 wer 5
xz name3 3 werf 7
$ cat file2
..... 2 cv name1
..... 4 cvb name4
..... 6 cvb name3
..... 8 vbn name5
$ awk 'NR==FNR{a[$2]=$NF;next} $NF in a{print $0, a[$NF], a[$NF] * $(NF-2)}' file1 file2
..... 2 cv name1 3 6
..... 6 cvb name3 7 42
Upvotes: 3
Reputation: 126722
It sounds like the best way is to read all of the relevant columns (2 and 5) from file1
into a hash, and then just process file2
line by line.
If the hash uses file1
column 2 as a key, then it can simply be indexed with the value of the last column of file2
to see if there was a corresponding value in the other file.
Then all that is necessary is to print the values from the file2
record, plus the calculated values using the data from the hash element.
This program demonstrates. The variable names are a little obscure as you don't give a meaning to the fields in your question, so all I can do is refer to file and column numbers.
I have had to output string*string
instead of the actual product, because your data doesn't have numerical values.
use strict;
use warnings;
use autodie;
open my $fh1, '<', 'file1';
my %file1;
while (<$fh1>) {
my ($f1_2, $f1_5) = (split)[1,4];
$file1{$f1_2} = $f1_5;
}
open my $fh2, '<', 'file2';
while (<$fh2>) {
my @f2_rec = split;
my ($f2_nm2, $f2_n) = @f2_rec[-3,-1];
if (my $f1_5 = $file1{$f2_n}) {
print join(' ', @f2_rec, $f1_5, "$f1_5*$f2_nm2"), "\n";
}
}
output
..... value1 cv name1 freq1 freq1*value1
..... value3 cvb name3 freq3 freq3*value3
Upvotes: 3