SaltedPork
SaltedPork

Reputation: 377

How to create an associative hash with multiple arrays?

I have dozens of files, half TSV and half CSV. I'm copying from specfic columns in each of them and pasting that into a new TSV file. I have the code for that below:

paste <(cut -d , -f 3 -s file.csv) <(cut -f 2 -s file.tsv) > merged.tsv

The TSV and CSV files share ID's in the filenames. For example mary.tsv/mary.csv and joseph.tsv/joseph.csv.

How can I substitute in mary.tsv and mary.csv into the cut commands by associating them their filenames together?

So far I have:

tsvarray=(`find . -iname "*.tsv"`)
csvarray=(`find . -iname "*.csv"`)

I could then do something like the code below inside a for loop?

paste <(cut -d , -f 3 -s $csvarray[@] <(cut -f 2 -s $tsvarray[@]) > merged.tsv

Upvotes: 0

Views: 57

Answers (1)

Barmar
Barmar

Reputation: 780655

You don't need a for loop. But you do need to make sure that the two arrays have filenames in the same order, so you should sort them.

You can use readarray and options in find and sort so that you don't have problems when filenames have spaces:

readarray -d '' tsvarray < <(find . -iname '*.tsv' -print0 | sort -z)
readarray -d '' csvarray < <(find . -iname '*.csv' -print0 | sort -z)

Then you need to use the correct syntax for referring to a subscripted array. You need {} around it. Then you should quote it, again to prevent problems when filenames contain whitespace.

paste <(cut -d , -f 3 -s "${csvarray[@]}") <(cut -f 2 -s "${tsvarray[@]}") > merged.tsv

This also assumes every .csv has a matching .tsv and vice versa. Otherwise the arrays will not correspond.

If you want separate merged files for each csv/tsv pair, you will need a loop:

for ((i = 0; i < ${#tsvarray[@]}; i++)); do
    paste <(cut -d , -f 3 -s "${csvarray[$i]}") <(cut -f 2 -s "${tsvarray[$i]}") > "${csvarray[$i]/.csv/.merged.csv}"
done

Upvotes: 1

Related Questions