Reputation: 11
I have something I need help with, would appreciate your help Let's take an example
I have file 1 with data
"eno", "ename", "salary"
"1","john","50000"
"2","steve","30000"
"3","aku","20000"
and I have file 2 with data
"eno", "ename", "incentives"
"1","john","2000"
"2","steve","5000"
"4","akshi","200"
And the expected output in 3 file I want is :
"eno", "ename", "t_salary"
"1","john","52000"
"2","steve","35000"
This is what is expected result
as I should be using eno and the ename as the primary key and output should be shown like this
Upvotes: 0
Views: 59
Reputation: 8406
Abbreviating the input files to f1 & f2, and breaking out the swiss army knife utils, (plus a bashism):
head -n 1 f1 | sed 's/sal/t_&/' ; \
grep -h -f <(tail -qn +2 f1 f2 | tr ',' '\t' | sort -k1,2 | \
rev | uniq -d -f1 | rev | \
cut -f 2) \
f1 f2 | \
tr -s ',"' '\t' | datamash -s -g2,3 sum 4 | sed 's/[^\t]*/"&"/g;s/\t/,/g'
Output:
"eno", "ename", "t_salary"
"1","john","52000"
"2","steve","35000"
The main job is fairly simple:
grep
searches for only those lines with duplicate (and therefore add-able) fields #1 & #2, and this is piped to...datamash
which does the adding.The rest of the code is reformatting needed to please the various text utils which all seem to have ugly but minor format inconsistencies.
Those rev
s are only needed because uniq
lacks most of sort
's field functions.
The tr
s are because uniq
also lacks a field separator switch, and datamash
can't sum quoted numbers. The sed
at the end is to undo all that tr
-ing.
Upvotes: 0
Reputation: 203229
$ cat tst.awk
BEGIN { FS="\"[[:space:]]*,[[:space:]]*\""; OFS="\",\"" }
{ key = $1 FS $2 }
NR==FNR { sal[key] = $NF; next }
key in sal { $3 = (FNR>1 ? $3+sal[key] : "t_salary") "\""; print }
$ awk -f tst.awk file1 file2
"eno","ename","t_salary"
"1","john","52000"
"2","steve","35000"
Get the book Effective Awk Programming, 4th Edition, by Arnold Robbins.
Upvotes: 0
Reputation: 67467
if your files are sorted and first field is the key, you can join
the files and work on the combined fields
that is,
$ join -t, file1 file2
"eno", "ename", "salary", "ename", "incentives"
"1","john","50000","john","2000"
"2","steve","30000","steve","5000"
and your awk
can be
... | awk -F, -v OFS=, 'NR==1{print ...}
NR>1{gsub(/"/,"",$3);
gsub(/"/,"",$5);
print $1,$2,$3+$5}'
printing header and quoting the total field is left as an exercise.
Upvotes: 1