John Perkins
John Perkins

Reputation: 13

Append Columns to CSV in For Loop

I'm looking for some help with my script. I'm trying to loop over a bunch of CSV files, cut out the 3rd column, and append that to an output file as a new column. Here's what I have so far:

#!/bin/bash

for n in ~/sampledir/*
do
    awk -F "," '{print $3","}' $n >> output.csv
done

The output looks like this:

Column3,
3,
33,
333,
3333,
33333,
Column3,
3,
33,
333,
3333,
33333,
Column3,
3,
33,
333,
3333,
33333,
Column3,
3,
33,
333,
3333,
33333,
Column3,
3,
33,
333,
3333,
33333,

What I want is for the new information to be appended to the CSV in columns, so rather than the output above, I want this:

Column3,Column3,Column3,Column3,Column3,Column3,
3,3,3,3,3,3,
33,33,33,33,33,33,
333,333,333,333,333,333,
3333,3333,3333,3333,3333,3333,
33333,33333,33333,33333,33333,33333,

Any guidance would be helpful. Thanks y'all.

Upvotes: 1

Views: 1284

Answers (2)

thanasisp
thanasisp

Reputation: 5975

Using awk:

awk -F, '{row[FNR] = (FNR==NR? $3: row[FNR] FS $3)}
     END {for (i=1;i<=length(row);i++) print row[i]}' *.csv

Using paste (bash shell):

For a known number of files, you could have

paste -d, <(cut -d, -f3 file1.csv) <(cut -d, -f3 file2.csv)

but this doesn't scale for unknown number of csv files. A hacky way could be:

cmd="paste -d, "
for f in *.csv; do cmd+=' <(cut -d, -f3 '"$f"')'; done
eval "$cmd"

but better go with awk.


Note: Your csv files should not have delimiters nested into quoted fields. In this case you have to use a more complex field separator with GNU awk.

Upvotes: 1

Mitchell P
Mitchell P

Reputation: 308

If the awk that you're using is gawk, you could have an awk script like this:

BEGIN { 
    FS="," 
    file_num = 0
    max_num_rows = 0
}

BEGINFILE { file_num++  }

{ 
    data[FNR SUBSEP file_num] = $3 
    if (FNR > max_num_rows) { max_num_rows++ }
}

END {

    for (i = 1; i <= max_num_rows; i++) {
        printf data[i SUBSEP 1]

        for (j = 2; j <= file_num; j++) {
            printf "," data[i SUBSEP j]
        }
        printf "\n"
    }
}

and then use like:

awk -f script.awk ~/sampledir/* > output.csv

Basic idea is to store the data you want into a multidimensional awk array, and then loop over the rows and columns and print the data. If you don't have BEGINFILE available, you could do some extra logic to do something similar. This should also work if the number of rows are different in the files.

Upvotes: 2

Related Questions