Campbell McGrouther
Campbell McGrouther

Reputation: 31

Adding columns to a csv table with AWK from multiple files

I'm looking to build a csv table by getting values from several files with AWK. I have it working with two files, but I can't scale it beyond that. I'm currently taking the output of the second file, and appending the third, and so on.

Here are example files:

#file1  #file2  #file3  #file4
100     45      1       5
200     23      1       2
300     29      2       1
400     0       1       2
500     74      4       5

This is the goal:

#data.csv
1,100,45,1,5
2,200,23,1,2
3,300,29,2,1
4,400,0,1,2
5,500,74,4,5

This is what I have working:

awk 'FNR==NR { a[FNR""] = NR", " $0","; next } { print a[FNR""], $0}' $file1 $file2

With the result:

1, 100, 45
2, 200, 23
3, 300, 29
4, 400, 0
5, 500, 74

But when I try and get it to work on 3 or more files, like so:

awk 'FNR==NR { a[FNR""] = NR", " $0","; next } { print a[FNR""], $0; next } { print a[FNR""], $0}' $file1 $file2 $file3

I get this output:

1, 100, 45
2, 200, 23
3, 300, 29
4, 400, 0
5, 500, 74
1, 100, 1 
2, 200, 1 
3, 300, 2
4, 400, 1
5, 500, 4

In the first column the line count restarts, and the second column it also repeats the first file. In the third column is where it adds the third and subsequent files as new rows, where I would expect these should be added as columns. No new rows required.

Any help would be greatly appreciated. I have learned most of my AWK from Stack Exchange, and I know I'm missing something fundamental here. Thanks,

Upvotes: 3

Views: 858

Answers (4)

thanasisp
thanasisp

Reputation: 5975

as already answered you can use paste. To get the exact output with comma delimited line numbering, you can do this

paste -d, file{1..4} | nl -s, -w1
  • -s, sets number separator as comma (default is tab).
  • -w1 sets number width, so there are no initial spaces (because default is bigger)

another solution with awk

awk    '{a[FNR]=a[FNR] "," $0} 
    END {for (i=1;i<=length(a);i++) print i a[i]}' file{1..4}

Upvotes: 5

Niall Cosgrove
Niall Cosgrove

Reputation: 1303

Here is a beginner friendly solution. If you need to manipulate the data on the way in you can clearly see which file is being read.
ARGIND is gawk specific. It tells us which file we are processing. We fill two arrays a and b from file1 and file2 and then print your desired output while processing file3.

awk '
ARGIND == 1 { a[FNR] = $0 ; next }
ARGIND == 2 { b[FNR] = $0 ; next }
ARGIND == 3 { print FNR "," a[FNR] "," b[FNR] "," $0 }
' file1 file2 file3

Output:

1,100,45,1
2,200,23,1
3,300,29,2
4,400,0,1
5,500,74,4

Upvotes: 1

randomir
randomir

Reputation: 18687

An awk solution for a variable number of files:

awk '{ !line[FNR] && line[FNR]=FNR; line[FNR]=line[FNR]","$0 }
     END { for (i=1; i<=length(line); i++) print line[i] }' file1 file2 ... fileN

For example:

$ awk '{ !line[FNR] && line[FNR]=FNR; line[FNR]=line[FNR]","$0 }
      END { for (i=1; i<=length(line); i++) print line[i] }' \
      <(seq 1 5) <(seq 11 15) <(seq 21 25) <(seq 31 35)
1,1,11,21,31
2,2,12,22,32
3,3,13,23,33
4,4,14,24,34
5,5,15,25,35

Upvotes: 1

Yoda
Yoda

Reputation: 445

Why don't you use paste and then simply number each row:-

paste -d"," file1 file2 file3 file4
100,45,1,5
200,23,1,2
300,29,2,1
400,0 ,1,2
500,74,4,5

Upvotes: 1

Related Questions