Reputation: 2641
Say I have this file1.csv consists of column A and column A_score looks like this:
fefa68e,312
wernjnn,432
ew443fs,300
and file2.csv consists of pairs of column B and column A looks like this:
dfaefew,fefa68e
dfaefew,wernjnn
vzcxvvz,ew443fs
ewrwefd,wernjnn
ewrwefd,ew443fs
How can I get a file3.csv to get the maximum of scores of all column A pairs with column B looks like this:
dfaefew,432
vzcxvvz,300
ewrwefd,432
and a file4.csv to get the average of scores of all column A pairs with column B looks like this:
dfaefew,372
vzcxvvz,300
ewrwefd,366
Can awk or anything else do the work? I am using ubuntu.
Thx ahead!
Update:
What if the file2.csv looks like this:
dfaefew,fefa68e,1
dfaefew,wernjnn,1
vzcxvvz,ew443fs,1
ewrwefd,wernjnn,0
ewrwefd,ew443fs,0
The third column could be 1 or 0 and 1 and it is the same for the same column 1 value(dfaefew, vzcxvvz etc) and I want to keep the third column and get output like this:
dfaefew,432,1
vzcxvvz,300,1
ewrwefd,432,0
dfaefew,372,1
vzcxvvz,300,1
ewrwefd,366,0
Upvotes: 0
Views: 111
Reputation: 77185
Here is one way of doing it in awk
:
script.awk
:# Set the input and output field separators to ","
BEGIN { FS = OFS = "," }
# Processing first file
# Load the first file in hash keyed at column1 having value of column2
NR==FNR { value[$1] = $2; next }
# Processing second file
# Keep a counter of column1 and add values for column2
{ count[$1]++; values[$1]+=value[$2] }
# Find the max for each entry of column1
{ entry[$1] = (($1 in entry) && entry[$1]>value[$2] ? entry[$1] : value[$2]) }
# In the END block traverse through array and print desired output.
END {
for (max in entry) print (max, entry[max]) > "file3.csv";
for (key in entry) print (key, values[key]/count[key]) > "file4.csv";
}
awk -f script.awk file1.csv file2.csv
$ cat file3.csv
vzcxvvz,300
ewrwefd,432
dfaefew,432
$ cat file4.csv
vzcxvvz,300
ewrwefd,366
dfaefew,372
Upvotes: 3