Reputation: 704
My input is:
TGCCTCAGTTCAGCAGGAACAGT_1 __not_aligned
CGCCCGATCTCGTCTGATCTCG_0 __too_low_aQual
TTTTAACGCGGACCAGAAACTA_2 __not_aligned
TACCGTGTAGAACCGAATTTGT_69 mir-10
AGGAAGCCCTGGAGGGGCTGGAGA_0 mir-671
I want the output to be:
__not_aligned 1
__too_low_aQual 0
__not_aligned 2
mir-10 69
mir-671 0
I was trying to use the cut function, but I am not sure how to switch the columns and how to get that specific output?
cut -d _ -f
Upvotes: 1
Views: 73
Reputation: 2164
You may try this with gawk:
awk '{match($1,"[0-9]+",a)}{print $2,a[0]}' file
outputs:
__not_aligned 1
__too_low_aQual 0
__not_aligned 2
mir-10 69
mir-671 0
Or better with POSIX (thanks to Ed Morton):
awk --posix '{match($1,/[0-9]+/);print $2,substr($0,RSTART,RLENGTH)}' file
Upvotes: 2
Reputation: 3451
If Perl is an option:
perl -lne 'if (/^([ACGT]+)_(\d+)\s+(.*)/){print "$3 $2"}' file
Capture 3 fields into a regex:
^([ACGT]+)
starting with one or more ACGT bases, followed by an underscore
(\d+)
one or more numerals, followed by \s+
whitespace
(.*)
anything
If the regex matches, print the 3rd field and the 2nd field
Upvotes: 0
Reputation: 2226
If you really want to use cut
, combine it with paste
to get your output:
paste <(tr -s '\t ' < foo.txt | cut -f 2 -d ' ') <(cut -f 1 -d ' ' foo.txt | cut -f 2 -d _)
__not_aligned 1
__too_low_aQual 0
__not_aligned 2
mir-10 69
mir-671 0
Upvotes: -1
Reputation: 203532
$ awk '{sub(/[^_]+_/,""); print $2, $1}' file
__not_aligned 1
__too_low_aQual 0
__not_aligned 2
mir-10 69
mir-671 0
.
$ sed -r 's/[^_]+_([0-9]+)[[:space:]]+(.*)/\2 \1/' file
__not_aligned 1
__too_low_aQual 0
__not_aligned 2
mir-10 69
mir-671 0
Upvotes: 1
Reputation: 8164
you can try use sed
instead of cut
sed 's/[ACGT]\+_\([0-9]\+\)[ \t]\+\([^ \t]\+\)/\2\t\1/g' file
you get
__not_aligned 1 __too_low_aQual 0 __not_aligned 2 mir-10 69 mir-671 0
Upvotes: 0