Reputation: 209
i have two text files
file 1
number,name,account id,vv,sfee,dac acc,TDID
7000,john,2,0,0,1,6
7001,elen,2,0,0,1,7
7002,sami,2,0,0,1,6
7003,mike,1,0,0,2,1
8001,nike,1,2,4,1,8
8002,paul,2,0,0,2,7
file 2
number,account id,dac acc,TDID
7000,2,1,6
7001,2,1,7
7002,2,1,6
7003,1,2,1
i want to compare those two text files. if the four columns of file 2 is there in file 1 and equal means i want output like this
7000,john,2,0,0,1,6
7001,elen,2,0,0,1,7
7002,sami,2,0,0,1,6
7003,mike,1,0,0,2,1
nawk -F"," 'NR==FNR {a[$1];next} ($1 in a)' file2.txt file1.txt
.. this works good for comparing two single column in two files. i want to compare multiple column. any one have suggestion?
EDIT: From the OP's comments:
nawk -F"," 'NR==FNR {a[$1];next} ($1 in a)' file2.txt file1.txt
.. this works good for comparing two single column in two files. i want to compare multiple column. you have any suggestion?
Upvotes: 1
Views: 14911
Reputation: 4261
The statistical package R handles processing multiple csv tables really easily. See An Intro. to R or R for Beginners.
Upvotes: 0
Reputation: 140327
This awk one-liner works for multi-column on unsorted files:
awk -F, 'NR==FNR{a[$1,$2,$3,$4]++;next} (a[$1,$3,$6,$7])' file1.txt file2.txt
In order for this to work, it is imperative that the first file used for input (file1.txt in my example) be the file that only has 4 fields like so:
7000,2,1,6
7001,2,1,7
7002,2,1,6
7003,1,2,1
7000,john,2,0,0,1,6
7000,john,2,0,0,1,7
7000,john,2,0,0,1,8
7000,john,2,0,0,1,9
7001,elen,2,0,0,1,7
7002,sami,2,0,0,1,6
7003,mike,1,0,0,2,1
7003,mike,1,0,0,2,2
7003,mike,1,0,0,2,3
7003,mike,1,0,0,2,4
8001,nike,1,2,4,1,8
8002,paul,2,0,0,2,7
$ awk -F, 'NR==FNR{a[$1,$2,$3,$4]++;next} (a[$1,$3,$6,$7])' file1.txt file2.txt
7000,john,2,0,0,1,6
7001,elen,2,0,0,1,7
7002,sami,2,0,0,1,6
7003,mike,1,0,0,2,1
Alternatively, you could also use the following syntax which more closely matches the one in your question but is not very readable IMHO
awk -F, 'NR==FNR{a[$1,$2,$3,$4];next} ($1SUBSEP$3SUBSEP$6SUBSEP$7 in a)' file1.txt file2.txt
Upvotes: 3
Reputation: 342
It's not an elegant one-liner, but you could do it with perl.
#!/usr/bin/perl
open A, $ARGV[0];
while(split/,/,<A>) {
$k{$_[0]} = [@_];
}
close A;
open B, $ARGV[1];
while(split/,/,<B>) {
print join(',',@{$k{$_[0]}}) if
defined($k{$_[0]}) &&
$k{$_[0]}->[2] == $_[1] &&
$k{$_[0]}->[5] == $_[2] &&
$k{$_[0]}->[6] == $_[3];
}
close B;
Upvotes: 1
Reputation: 28392
This is neither efficient nor pretty it will however get the job done. It is not the most efficient implementation as it parses file1 multiple times however it does not read the entire file into RAM either so has some benefits over the simple scripting approaches.
sed -n '2,$p' file1 | awk -F, '{print $1 "," $3 "," $6 "," $7 " " $0 }' | \
sort | join file2 - |awk '{print $2}'
This works as follows
sed -n '2,$p' file1
sends file1 to STDOUT without the header lineIn order for this to work you must ensure that file2 is sorted before running the command.
Running this against your example data gave the following result
7000,john,2,0,0,1,6 7001,elen,2,0,0,1,7 7002,sami,2,0,0,1,6 7003,mike,1,0,0,2,1
EDIT
I note from your comments you are getting a sorting error. If this error is occuring when sorting file2 before running the pipeline command then you could split the file, sort each part and then cat them back together again.
Something like this would do that for you
mv file2 file2.orig
for i in 0 1 2 3 4 5 6 7 8 9
do
grep "^${i}" file2.orig |sort > file2.$i
done
cat file2.[0-9] >file2
rm file2.[0-9] file2.orig
You may need to modify the variables passed to for if your file is not distributed evenly across the full range of leading digits.
Upvotes: 0
Reputation: 43688
Not really well tested, but this might work:
join -t, file1 file2 | awk -F, 'BEGIN{OFS=","} {if ($3==$8 && $6==$9 && $7==$10) print $1,$2,$3,$4,$6,$7}'
(Of course, this assumes the input files are sorted).
Upvotes: 0
Reputation: 38718
TxtSushi looks like what you want. It allows to work with CSV files using SQL.
Upvotes: 1
Reputation: 67760
Quick answer: Use cut
to split out the fields you need and diff
to compare the results.
Upvotes: 0