BenWest
BenWest

Reputation: 45

How to add column n from all csv files and store in another csv file (with the header being the filename from which the column came from)?

I have multiple .csv files and I want to extract a particular column from each. Lets say column 5. I want to take that column and add it to a csv file and append a new column onto it from each successive file. I am able to do this with the following code taken from someone else:

awk '{_[FNR]=(_[FNR] OFS $1)}END{for (i=1; i<=FNR; i++) {sub(/^ /,"",_[i]); print _[i]}}' input*.csv > output.csv`

When I look at the output file I notice that the order in which the columns is added isn't sequential. As a result I was hoping to modify the code so that the header of the column is the filename from which the column came. How can I go about doing this?

For example: input1.csv can be:

1,2,3,4,5
6,7,8,9,10

input2.csv could be:

11,12,13,14,15
16,17,18,19,20

And I would want the output.csv to be:

input1.csv, input2.csv
5,15
10,20

I hope this makes sense and thanks in advance.

Upvotes: 2

Views: 76

Answers (1)

RavinderSingh13
RavinderSingh13

Reputation: 133458

Could you please try following. Written and tested with shown samples in GNU awk.

awk '
BEGIN{ FS=OFS="," }
FNR==1{
  fileName=(fileName?fileName", ":"")FILENAME
}
{
  max=(max>FNR?max:FNR)
  val[FNR]=(val[FNR] == "" ? "" : val[FNR] OFS) $NF
}
END{
  print fileName
  for(i=1;i<=max;i++){
    print val[i]
  }
}
' *.csv  > output.csv

With shown samples output file named output.csv will be crated with following contents.

input1.csv, input2.csv
5,15
10,20

Explanation: Adding detailed explanation for above.

awk '                                           ##Starting awk program from here.
BEGIN{ FS=OFS="," }                             ##Starting BEGIN section from here and setting field separator and output field separator as comma here.
FNR==1{                                         ##If this is first line of all Input_file
  fileName=(fileName?fileName", ":"")FILENAME   ##Creating fileName which has current Input_file name in it and keep adding it.
}
{
  max=(max>FNR?max:FNR)                         ##Creating max, to get highest number of lines.
  val[FNR]=(val[FNR] == "" ? "" : val[FNR] OFS) $NF        ##Creating val with index of FNR and keep adding values of last field in it.
}
END{                                            ##Starting END block of this program from here.
  print fileName                                ##Printing all file names in outputFile.
  for(i=1;i<=max;i++){                          ##Starting for loop from 1 to max here.
    print val[i]                                ##Printing array val value here.
  }
}
' *.csv > output.csv                            ##Mentioning all *.csv files here.

Upvotes: 2

Related Questions