Reputation: 11

Checking 2 files in Unix and finding the sum of a particular column in 3rd file through shell script

I have something I need help with, would appreciate your help Let's take an example

I have file 1 with data

"eno", "ename", "salary"
"1","john","50000"
"2","steve","30000"
"3","aku","20000"

and I have file 2 with data

"eno", "ename", "incentives"
"1","john","2000"
"2","steve","5000"
"4","akshi","200"

And the expected output in 3 file I want is :

"eno", "ename", "t_salary"
"1","john","52000"
"2","steve","35000"

This is what is expected result

as I should be using eno and the ename as the primary key and output should be shown like this

Upvotes: 0

Answers (3)

agc

Reputation: 8406

Abbreviating the input files to f1 & f2, and breaking out the swiss army knife utils, (plus a bashism):

head -n 1 f1 | sed 's/sal/t_&/' ; \
grep -h -f <(tail -qn +2 f1 f2 | tr ',' '\t' | sort -k1,2 | \
             rev | uniq -d -f1 | rev | \
             cut -f 2) \
      f1 f2 | \
tr -s ',"' '\t' | datamash -s -g2,3 sum 4 | sed 's/[^\t]*/"&"/g;s/\t/,/g'

Output:

"eno", "ename", "t_salary"
"1","john","52000"
"2","steve","35000"

The main job is fairly simple:

grep searches for only those lines with duplicate (and therefore add-able) fields #1 & #2, and this is piped to...
datamash which does the adding.

The rest of the code is reformatting needed to please the various text utils which all seem to have ugly but minor format inconsistencies.

Those revs are only needed because uniq lacks most of sort's field functions.

The trs are because uniq also lacks a field separator switch, and datamash can't sum quoted numbers. The sed at the end is to undo all that tr-ing.

Upvotes: 0

Ed Morton

Reputation: 203229

$ cat tst.awk
BEGIN { FS="\"[[:space:]]*,[[:space:]]*\""; OFS="\",\"" }
{ key = $1 FS $2 }
NR==FNR { sal[key] = $NF; next }
key in sal { $3 = (FNR>1 ? $3+sal[key] : "t_salary") "\""; print }

$ awk -f tst.awk file1 file2
"eno","ename","t_salary"
"1","john","52000"
"2","steve","35000"

Get the book Effective Awk Programming, 4th Edition, by Arnold Robbins.

Upvotes: 0

karakfa

Reputation: 67467

if your files are sorted and first field is the key, you can join the files and work on the combined fields

that is,

$ join -t, file1 file2
"eno", "ename", "salary", "ename", "incentives"
"1","john","50000","john","2000"
"2","steve","30000","steve","5000"

and your awk can be

... | awk -F, -v OFS=, 'NR==1{print ...}
                         NR>1{gsub(/"/,"",$3); 
                              gsub(/"/,"",$5); 
                              print $1,$2,$3+$5}'

printing header and quoting the total field is left as an exercise.

Upvotes: 1

Checking 2 files in Unix and finding the sum of a particular column in 3rd file through shell script

Answers (3)

Related Questions