Reputation: 704
I would like to combine 96 files by taking the second column from each files and keep the first column which is similar between all files. I tried to do this in R, but figued it would be better in the terminal. Does it work using awk?
Sample data:
DMED7013:Rfam robinm$ head Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput402R.sam
Seq_../trimmed/402R.tally.fasta __not_aligned
__too_low_aQual 3
mir-10 5
Y_RNA 4
__too_low_aQual 0
__too_low_aQual 0
__not_aligned 1
mir-8 2
mir-671 3
mir-671 16
The files:
DMED7013:Rfam robinm$ ls -l
-rw-r--r-- 1 robinm staff 1711388 Sep 22 19:12 Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput100G.sam
-rw-r--r-- 1 robinm staff 1712778 Sep 22 19:12 Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput100R.sam
-rw-r--r-- 1 robinm staff 1709703 Sep 22 19:12 Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput106G.sam
-rw-r--r-- 1 robinm staff 1707486 Sep 22 19:12 Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput106R.sam
-rw-r--r-- 1 robinm staff 1704757 Sep 22 19:12 Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput122G.sam
-rw-r--r-- 1 robinm staff 1705471 Sep 22 19:12 Rfam_Counts_combined_SplitRfam_Counts_combinedhtseq_Rfamoutput122R.sam
.....
Upvotes: 1
Views: 224
Reputation: 67527
this script will do what you need
t0=mktemp; touch t0;
for f in prefix*.csv;
do paste t0 <(cut -d" " -f2 $f) > t1 && mv t1 t0;
done;
tr '\t' ' ' <t0 && rm t0
use cut/paste to collect second columns in a temp file; when done remove the temp file after printing results.
To preserve the first column change touch t0
to cut -d" " -f1 oneofthefiles.csv > t0
Alternatively, awk to the rescue!
awk '
{a[FNR]=a[FNR]?a[FNR] OFS $2:$1}
END{for(i=1;i<=FNR;i++) print a[i]}
' prefix*.csv
combine second field from all files based on the line number, preserve the first field from first file.
Upvotes: 2
Reputation: 8174
you can try (if, I understood correctly)
awk '!($1 in d){d[$1]=$2; next}
{d[$1]+=$2}
END{for(key in d) print key, d[key]; }' *.sam
you get:
__too_low_aQual 3 mir-671 19 mir-8 2 __not_aligned 1 Y_RNA 4 mir-10 5
Upvotes: 2