Reputation: 531
I'd prefer a solution that uses bash rather than converting to a dataframe in python, etc as the files are quite big
I have a folder of CSVs that I'd like to merge into one CSV. The CSVs all have the same header save a few exceptions so I need to rewrite the name of each added column with the filename as a prefix to keep track of which file the column came from.
head globcover_color.csv glds00g.csv
==> file1.csv <==
id,max,mean,90
2870316.0,111.77777777777777
2870317.0,63.888888888888886
2870318.0,73.6
2870319.0,83.88888888888889
==> file2.csv <==
ogc_fid,id,_sum
"1","2870316",9.98795110916615
"2","2870317",12.3311055738527
"3","2870318",9.81535963468479
"4","2870319",7.77729743926775
The id
column of each file might be in a different "datatype" but in every file the id matches the line number. For example, line 2 is always id 2870316
.
Anticipated output:
file1_id,file1_90,file2_ogc_fid,file2_id,file2__sum
2870316.0,111.77777777777777,"1","2870316",9.98795110916615
2870317.0,63.888888888888886,"2","2870317",12.3311055738527
2870318.0,73.6,"3","2870318",9.81535963468479
2870319.0,83.88888888888889,"4","2870319",7.77729743926775
I'm not quite sure how to do this but I think I'd use the paste command at some point. I'm surprised that I couldn't find a similar question on stackoverflow but I guess it's not that common to have CSV with the same id on the same line number
edit:
I figured out the first part.
paste -d , * > ../rasterjointest.txt
achieves what I want but the header needs to be replaced
Upvotes: 0
Views: 253
Reputation: 203229
$ cat tst.awk
BEGIN { FS=OFS="," }
FNR==1 {
fname = FILENAME
sub(/\.[^.]+$/,"",fname)
for (i=1; i<=NF; i++) {
$i = fname "_" $i
}
}
{ row[FNR] = (NR==FNR ? "" : row[FNR] OFS) $0 }
END {
for (rowNr=1; rowNr<=FNR; rowNr++) {
print row[rowNr]
}
}
$ awk -f tst.awk file1.csv file2.csv
file1_id,file1_max,file1_mean,file1_90,file2_ogc_fid,file2_id,file2__sum
2870316.0,111.77777777777777,"1","2870316",9.98795110916615
2870317.0,63.888888888888886,"2","2870317",12.3311055738527
2870318.0,73.6,"3","2870318",9.81535963468479
2870319.0,83.88888888888889,"4","2870319",7.77729743926775
To use minimal memory in awk:
$ cat tst.awk
BEGIN {
FS=OFS=","
for (fileNr=1; fileNr<ARGC; fileNr++) {
filename = ARGV[fileNr]
if ( (getline < filename) > 0 ) {
fname = filename
sub(/\.[^.]+$/,"",fname)
for (i=1; i<=NF; i++) {
$i = fname "_" $i
}
}
row = (fileNr==1 ? "" : row OFS) $0
}
print row
exit
}
$ awk -f tst.awk file1.csv file2.csv; paste -d, file1.csv file2.csv | tail -n +2
file1_id,file1_max,file1_mean,file1_90,file2_ogc_fid,file2_id,file2__sum
2870316.0,111.77777777777777,"1","2870316",9.98795110916615
2870317.0,63.888888888888886,"2","2870317",12.3311055738527
2870318.0,73.6,"3","2870318",9.81535963468479
2870319.0,83.88888888888889,"4","2870319",7.77729743926775
Upvotes: 1