Reputation: 45
I have multiple .csv files and I want to extract a particular column from each. Lets say column 5. I want to take that column and add it to a csv file and append a new column onto it from each successive file. I am able to do this with the following code taken from someone else:
awk '{_[FNR]=(_[FNR] OFS $1)}END{for (i=1; i<=FNR; i++) {sub(/^ /,"",_[i]); print _[i]}}' input*.csv > output.csv`
When I look at the output file I notice that the order in which the columns is added isn't sequential. As a result I was hoping to modify the code so that the header of the column is the filename from which the column came. How can I go about doing this?
For example: input1.csv can be:
1,2,3,4,5
6,7,8,9,10
input2.csv could be:
11,12,13,14,15
16,17,18,19,20
And I would want the output.csv to be:
input1.csv, input2.csv
5,15
10,20
I hope this makes sense and thanks in advance.
Upvotes: 2
Views: 76
Reputation: 133458
Could you please try following. Written and tested with shown samples in GNU awk
.
awk '
BEGIN{ FS=OFS="," }
FNR==1{
fileName=(fileName?fileName", ":"")FILENAME
}
{
max=(max>FNR?max:FNR)
val[FNR]=(val[FNR] == "" ? "" : val[FNR] OFS) $NF
}
END{
print fileName
for(i=1;i<=max;i++){
print val[i]
}
}
' *.csv > output.csv
With shown samples output file named output.csv
will be crated with following contents.
input1.csv, input2.csv
5,15
10,20
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ FS=OFS="," } ##Starting BEGIN section from here and setting field separator and output field separator as comma here.
FNR==1{ ##If this is first line of all Input_file
fileName=(fileName?fileName", ":"")FILENAME ##Creating fileName which has current Input_file name in it and keep adding it.
}
{
max=(max>FNR?max:FNR) ##Creating max, to get highest number of lines.
val[FNR]=(val[FNR] == "" ? "" : val[FNR] OFS) $NF ##Creating val with index of FNR and keep adding values of last field in it.
}
END{ ##Starting END block of this program from here.
print fileName ##Printing all file names in outputFile.
for(i=1;i<=max;i++){ ##Starting for loop from 1 to max here.
print val[i] ##Printing array val value here.
}
}
' *.csv > output.csv ##Mentioning all *.csv files here.
Upvotes: 2