BioMan
BioMan

Reputation: 704

Split string and switch output columns

My input is:

TGCCTCAGTTCAGCAGGAACAGT_1       __not_aligned
CGCCCGATCTCGTCTGATCTCG_0        __too_low_aQual
TTTTAACGCGGACCAGAAACTA_2        __not_aligned
TACCGTGTAGAACCGAATTTGT_69       mir-10
AGGAAGCCCTGGAGGGGCTGGAGA_0      mir-671

I want the output to be:

   __not_aligned    1
   __too_low_aQual  0
   __not_aligned  2
   mir-10    69
   mir-671          0

I was trying to use the cut function, but I am not sure how to switch the columns and how to get that specific output?

cut -d _ -f 

Upvotes: 1

Views: 73

Answers (5)

Manuel Barbe
Manuel Barbe

Reputation: 2164

You may try this with gawk:

awk '{match($1,"[0-9]+",a)}{print $2,a[0]}' file

outputs:

__not_aligned 1
__too_low_aQual 0
__not_aligned 2
mir-10 69
mir-671 0

Or better with POSIX (thanks to Ed Morton):

 awk --posix '{match($1,/[0-9]+/);print $2,substr($0,RSTART,RLENGTH)}' file

Upvotes: 2

Chris Koknat
Chris Koknat

Reputation: 3451

If Perl is an option:

perl -lne 'if (/^([ACGT]+)_(\d+)\s+(.*)/){print "$3 $2"}' file

Capture 3 fields into a regex:
^([ACGT]+) starting with one or more ACGT bases, followed by an underscore
(\d+) one or more numerals, followed by \s+ whitespace
(.*) anything

If the regex matches, print the 3rd field and the 2nd field

Upvotes: 0

pcantalupo
pcantalupo

Reputation: 2226

If you really want to use cut, combine it with paste to get your output:

paste <(tr -s '\t ' < foo.txt | cut -f 2 -d ' ') <(cut -f 1 -d ' ' foo.txt | cut -f 2 -d _)

__not_aligned   1
__too_low_aQual 0
__not_aligned   2
mir-10  69
mir-671 0

Upvotes: -1

Ed Morton
Ed Morton

Reputation: 203532

$ awk '{sub(/[^_]+_/,""); print $2, $1}' file
__not_aligned 1
__too_low_aQual 0
__not_aligned 2
mir-10 69
mir-671 0

.

$ sed -r 's/[^_]+_([0-9]+)[[:space:]]+(.*)/\2 \1/' file
__not_aligned 1
__too_low_aQual 0
__not_aligned 2
mir-10 69
mir-671 0

Upvotes: 1

Jose Ricardo Bustos M.
Jose Ricardo Bustos M.

Reputation: 8164

you can try use sed instead of cut

sed 's/[ACGT]\+_\([0-9]\+\)[ \t]\+\([^ \t]\+\)/\2\t\1/g' file

you get

__not_aligned   1
__too_low_aQual 0
__not_aligned   2
mir-10  69
mir-671 0

Upvotes: 0

Related Questions