Reputation: 704
I have 100 files with similar format (shown below). I want to add a new column to every files that combines the column mir
and seq
separated by an _
This has to be done automatically for all files. All my files have these names:
Miraligner_240G.txt.mirna
Miraligner_239R.txt.mirna
....
They differ between _ and .txt.mirna
My input files look like this:
seq name freq mir start end mism add t5 t3 s5 s3 DB ambiguity
TGAGAAGAAGCACTGTAGCTCTT seq_100006_x2 2 hsa-miR-143-3p 61 81 6AT u-TT 0 0 AGTCTGAG GCTCAGGA miRNA 1
GACCCTGTAGATCCGAATTTGTA seq_100012_x2 2 hsa-miR-10a-5p 22 43 1GT u-A 0 u-G TATATACC TGTGTAAG miRNA 1
GACCCTGTAGATCCGAATTTGTG seq_100013_x35 35 hsa-miR-10a-5p 22 44 1GT 0 0 0 TATATACC TGTGTAAG miRNA 1
TTAGGGCCCTGGCTCCATCT seq_100019_x13 13 hsa-miR-1296-5p 16 35 0 0 0 u-CC TGGGTTAG CTCCTTTA miRNA 1
GTGAACGGGCGCCATCCCGAGGCTT seq_100029_x2 2 hsa-miR-887-3p 48 72 0 0 0 d-CTT TGGAGTGA GAGGCTTT miRNA 1
ACCCGGTAGATCCGAATTTGTG seq_10002_x5 5 hsa-miR-10a-5p 23 44 5GT 0 d-T 0 TATATACC TGTGTAAG miRNA 1
CAACGGAATCCCAAAAGCAGCTGAAAA seq_100031_x3 3 hsa-miR-191-5p 16 39 24AT u-AAA 0 d-T CGGGCAAC GCTGTTGT miRNA 1
TAGTGCAATATTGCTTATAGGGTAT seq_100033_x2 2 hsa-miR-454-3p 64 86 0 u-AT 0 0 TGAGTAGT GGGTTTTG miRNA 1
CAACGGAATCCGAAAAGCAGCTG seq_100037_x16 16 hsa-miR-191-5p 16 38 12GC 0 0 0 CGGGCAAC GCTGTTGT miRNA 1
My output files should look like this:
mir_seq seq name freq mir start end mism add t5 t3 s5 s3 DB ambiguity
hsa-miR-143-3p_TGAGAAGAAGCACTGTAGCTCTT TGAGAAGAAGCACTGTAGCTCTT seq_100006_x2 2 hsa-miR-143-3p 61 81 6AT u-TT 0 0 AGTCTGAG GCTCAGGA miRNA 1
....
....
Upvotes: 0
Views: 1791
Reputation: 3223
Using awk
you can add a column containing values of column 1 and 4 as:
awk '{print $4"_"$1, $0}' filename
OFS
will print the value of output field separator variable.
On piping the output to column -t
:
mir_seq seq name freq mir start end mism add t5 t3 s5 s3 DB ambiguity
hsa-miR-143-3p_TGAGAAGAAGCACTGTAGCTCTT TGAGAAGAAGCACTGTAGCTCTT seq_100006_x2 2 hsa-miR-143-3p 61 81 6AT u-TT 0 0 AGTCTGAG GCTCAGGA miRNA 1
hsa-miR-10a-5p_GACCCTGTAGATCCGAATTTGTA GACCCTGTAGATCCGAATTTGTA seq_100012_x2 2 hsa-miR-10a-5p 22 43 1GT u-A 0 u-G TATATACC TGTGTAAG miRNA 1
hsa-miR-10a-5p_GACCCTGTAGATCCGAATTTGTG GACCCTGTAGATCCGAATTTGTG seq_100013_x35 35 hsa-miR-10a-5p 22 44 1GT 0 0 0 TATATACC TGTGTAAG miRNA 1
hsa-miR-1296-5p_TTAGGGCCCTGGCTCCATCT TTAGGGCCCTGGCTCCATCT seq_100019_x13 13 hsa-miR-1296-5p 16 35 0 0 0 u-CC TGGGTTAG CTCCTTTA miRNA 1
hsa-miR-887-3p_GTGAACGGGCGCCATCCCGAGGCTT GTGAACGGGCGCCATCCCGAGGCTT seq_100029_x2 2 hsa-miR-887-3p 48 72 0 0 0 d-CTT TGGAGTGA GAGGCTTT miRNA 1
hsa-miR-10a-5p_ACCCGGTAGATCCGAATTTGTG ACCCGGTAGATCCGAATTTGTG seq_10002_x5 5 hsa-miR-10a-5p 23 44 5GT 0 d-T 0 TATATACC TGTGTAAG miRNA 1
hsa-miR-191-5p_CAACGGAATCCCAAAAGCAGCTGAAAA CAACGGAATCCCAAAAGCAGCTGAAAA seq_100031_x3 3 hsa-miR-191-5p 16 39 24AT u-AAA 0 d-T CGGGCAAC GCTGTTGT miRNA 1
hsa-miR-454-3p_TAGTGCAATATTGCTTATAGGGTAT TAGTGCAATATTGCTTATAGGGTAT seq_100033_x2 2 hsa-miR-454-3p 64 86 0 u-AT 0 0 TGAGTAGT GGGTTTTG miRNA 1
hsa-miR-191-5p_CAACGGAATCCGAAAAGCAGCTG CAACGGAATCCGAAAAGCAGCTG seq_100037_x16 16 hsa-miR-191-5p 16 38 12GC 0 0 0 CGGGCAAC GCTGTTGT miRNA 1
Since awk
don't have inline editing option, you will have to use gawk
for inline editing. Using awk
, you can write the output to temporary file and then move/copy/rename it to original file name.
For using the command in multiple file:
for i in Miraligner_*.txt.mirna; do
awk '{print $4"_"$1, $0}' "$i" | column -t;
done
If you are using gawk
and interested in doing inline editing, use gawk -i inplace
Using perl
:
perl -ane 'print "$F[3]_$F[0] $_";' filename | column -t
If you want to write to file, use -i
option:
perl -ane 'print "$F[3]_$F[0] $_";' -i filename
Separate all the input fields and the appended column (field) with \t
:
perl -ane '$"="\t"; print "$F[3]_$F[0] @F\n";' -i filename
If you want your output in your file in proper tabular form:
for i in Miraligner_*.txt.mirna; do
awk '{print $4"_"$1, $0}' "$i" | column -t > temp && mv temp "$i";
done
This will give output separated into proper column in your file. For this, you won't need inline editing option.
Thanks to @EdMorton for correcting my mistakes.
Upvotes: 4