Reputation: 579
I have a file that I need to append a line to (and a couple of characters) after the last match of the beginning of a line (the 3 digit number). The data are grouped by (many) gene number (122,239,...), and each gene may have variable numbers of individuals.
cat test
122_mex1 TGCAGGC
122_mex2 TGCAGTC
122_mex3 TGCAGTC
122_can4 TGCATTT
239_mex1 TGCAAAA
239_mex2 TGCAAAA
239_can4 TGCAGCA
...
567_can4 TGCAAAT
The output should look like this:
cat output
122_mex1 TGCAGGC
122_mex2 TGCAGTC
122_mex3 TGCAGTC
122_can4 TGCATTT
//|1
239_mex1 TGCAAAA
239_mex2 TGCAAAA
239_can4 TGCAGCA
//|2
etc.
How then to find the last match of each gene number that starts each line and append a line with some characters, counting up (1, 2, 3, etc)?
I have found a way to append a line after a provided match (e.g. 122)
awk '/122/{seen++} seen && !/122/{print "//|1"; seen=0} 1' test
but id like to append for all gene numbers (122, 239, 455, 234, etc), looping over genes and appending each successive gene number on the following line "//i".
Any thoughts on how to start this?
Thanks!
Upvotes: 2
Views: 133
Reputation: 784968
You can use awk
:
awk -F_ 'p!=""{p=$1;next} p != $1 {p=$1; print "//|" ++i} 1; END{print "//|" ++i}' test
122_mex2 TGCAGTC
122_mex3 TGCAGTC
122_can4 TGCATTT
//|1
239_mex1 TGCAAAA
239_mex2 TGCAAAA
239_can4 TGCAGCA
//|2
Explanation:
-F_ # set field separator as _
p!=""{p=$1;next} # first time if p is not set, set p=$1 and move to next line
p != $1 # if 1st field is != previous value of 1st field
{p=$1; print "//|" ++i} # set p=$1 and print divider line with an incrementing var
1; # default action to print each record
END{print "//|" ++i} # END block to print divider line last time
Upvotes: 1
Reputation: 85775
This will do the trick:
$ awk -F_ 'NR>1 && last!=$1{print "//|"++i}{last=$1}1' test
122_mex1 TGCAGGC
122_mex2 TGCAGTC
122_mex3 TGCAGTC
122_can4 TGCATTT
//|1
239_mex1 TGCAAAA
239_mex2 TGCAAAA
239_can4 TGCAGCA
//|2
...
//|3
567_can4 TGCAAAT
To save the results use shell redirection:
$ awk -F_ 'NR>1 && last!=$1{print "//|"++i}{last=$1}1' test > output
Upvotes: 3