Merge multiple files: 1st Column (same string), 2nd Column (unique values per file)

Question

I have multiple files (*.csv) that have a common column ("Common_Title") with corresponding values ("Value#") and I need to merge these into a summary file.

A difficult part is I need all values to get passed onto the summary file, but individual files sometimes lack a certain "Common_Title" and therefore need to put a "0000" in its "Value#" place.

This is 3 example files, all a bit different

# File1.csv

Common_Title,Value1
AAAAA,1111
BBBBB,1111
CCCCC,1111

# File2.csv

Common_Title,Value2
AAAAA,2222
BBBBB,2222
DDDDD,2222
EEEEE,2222

# File3.csv

Common_Title,Value3
AAAAA,3333
BBBBB,3333
CCCCC,3333
EEEEE,3333

which I'd like to merge into this summary file

# MergedFiles123.csv

Common_Title,Value1,Value2,Value3
AAAAA,1111,2222,3333
BBBBB,1111,2222,3333
CCCCC,1111,0000,3333
DDDDD,0000,2222,0000
EEEEE,0000,2222,3333

konsolebox · Accepted Answer

Using awk:

awk -F, '!a[$1]++ { keys[++i] = $1 } !r[FILENAME]++ { ++argind } { b[$1, argind] = $2 } END { for (i = 1; i in keys; ++i) { k = keys[i]; t = k; for (j = 1; j <= argind; ++j) t = t FS (length(b[k, j]) ? b[k, j] : "0000"); print t } }' file1 file2 file3

Output:

Common_Title,Value1,Value2,Value3
AAAAA,1111,2222,3333
BBBBB,1111,2222,3333
CCCCC,1111,0000,3333
DDDDD,0000,2222,0000
EEEEE,0000,2222,3333

Comments:

!a[$1]++ { keys[++i] = $1 } With the condition !a[$1]++, block ({}) could only evaluate when $1 is first encountered. Inside the block, the key ($1) is added to the keys array once.
!r[FILENAME]++ { ++argind } Increments the counter argind everytime a new file is encountered.
{ b[$1, argind] = $2 } Stores the data with reference to key and the current file's index argind.
END { for (i = 1; i in keys; ++i) { k = keys[i]; t = k; for (j = 1; j <= argind; ++j) t = t FS (length(b[k, j]) ? b[k, j] : "0000"); print t } }' Runs after all files are processed. It walks through every key in keys in order; and prints all data based from that key as a single line which is taken from b[] with included reference to the indices.

Merge multiple files: 1st Column (same string), 2nd Column (unique values per file)

Answers (2)

Related Questions