Reputation: 19
I have an unknown number of input files that all match a search string, let's say *.dat, and all have 2 columns of data and equal number of rows. In bash I need to take the 2nd column in each file and append it as a new column in a singular merged file.
Eg:
>>cat File1.dat
1 A
2 B
3 C
>>cat File2.dat
4 D
5 E
6 F
>>cat combined.dat
A D
B E
C F
Here is the code I have tried, the approach I have gone for is to try to loop and append:
for filename in $(ls *.dat); do paste combined.dat <(awk '{print $2}' $filename) >> combined.dat; done
The output format can be anything so long as its tab delimited, and the key is it must work on any number of input files up to...100 approx, where the number isn't known in advance.
Upvotes: 0
Views: 39
Reputation: 27205
Since you already use awk
, you could to the whole work in awk
:
rm -f combined.dat
awk 'FNR<NR{d="\t"} {a[FNR]=a[FNR] d $2} END{for(i=1;i<=FNR;i++) print a[i]}' *.dat > combined.dat
paste
You can repeatedly paste combined.dat
and the next found file. The only tricky part is getting the first paste
right where combined.dat
does not exist or is empty. You could use an if
, but that would be boring. Here we use a trick: paste
acts like cat
when used with only one argument. With arrays we can conveniently specify optional further arguments. We also used sponge
from moreutils
to make sure that combined.dat
is not mangled due to concurrent reads and writes – if you don't want to install sponge
you have to use a temporary file or variables instead.
rm -f combined.dat
p=()
for f in *.dat; do
awk '{print $2}' "$f" | paste "${p[@]}" - | sponge combined.dat
p=(combined.dat)
done
paste
Alternatively, you could build a bash command and execute that. No worries, eval
is save here as printf %q
ensures correct quoting.
rm -f combined.dat
eval "paste $(printf "<(awk '{printf \$2}' %q) " *.dat) > combined.dat"
Upvotes: 2
Reputation: 51
Short draft, especially inserting the new lines and tabs could be optimized:
#!/bin/bash
nrLines=$(wc -l < `(ls *dat | head -1)` | xargs)
i=1
while [ ${i} -le ${nrLines} ];
do
for file in $(ls *dat); do
awk -v line=${i} 'NR==line {printf $2}' ${file} >> consolidatedreport.txt
echo -en "\t" >> consolidatedreport.txt
done
i=$[$i+1]
echo "" >> consolidatedreport.txt
done
Be careful that, dependent on how you output data to your new file and how you iterate over your existing files, you might end up iterating over your newly created file. So be sure to either use a different ending other than *dat if you iterate over all files with that ending (I used txt in the example) or place the resulting file in a subfolder.
Upvotes: 0