Reputation: 377
I have dozens of files, half TSV and half CSV. I'm copying from specfic columns in each of them and pasting that into a new TSV file. I have the code for that below:
paste <(cut -d , -f 3 -s file.csv) <(cut -f 2 -s file.tsv) > merged.tsv
The TSV and CSV files share ID's in the filenames. For example mary.tsv/mary.csv and joseph.tsv/joseph.csv.
How can I substitute in mary.tsv and mary.csv into the cut
commands by associating them their filenames together?
So far I have:
tsvarray=(`find . -iname "*.tsv"`)
csvarray=(`find . -iname "*.csv"`)
I could then do something like the code below inside a for loop?
paste <(cut -d , -f 3 -s $csvarray[@] <(cut -f 2 -s $tsvarray[@]) > merged.tsv
Upvotes: 0
Views: 57
Reputation: 780655
You don't need a for
loop. But you do need to make sure that the two arrays have filenames in the same order, so you should sort them.
You can use readarray
and options in find
and sort
so that you don't have problems when filenames have spaces:
readarray -d '' tsvarray < <(find . -iname '*.tsv' -print0 | sort -z)
readarray -d '' csvarray < <(find . -iname '*.csv' -print0 | sort -z)
Then you need to use the correct syntax for referring to a subscripted array. You need {}
around it. Then you should quote it, again to prevent problems when filenames contain whitespace.
paste <(cut -d , -f 3 -s "${csvarray[@]}") <(cut -f 2 -s "${tsvarray[@]}") > merged.tsv
This also assumes every .csv
has a matching .tsv
and vice versa. Otherwise the arrays will not correspond.
If you want separate merged files for each csv/tsv pair, you will need a loop:
for ((i = 0; i < ${#tsvarray[@]}; i++)); do
paste <(cut -d , -f 3 -s "${csvarray[$i]}") <(cut -f 2 -s "${tsvarray[$i]}") > "${csvarray[$i]/.csv/.merged.csv}"
done
Upvotes: 1