Reputation: 2082
I have a list of DNA sequences (one per line):
ACTGCTCGGGGG.....
CGCTCGCTTCTCTC...
etc
Most sequences contain two specific motifs, one close to the begining and one closer to the end. I am extracting the sequences inbetween:
grep "motif1.*motif2" inputfile > outputfile
in ruby with scan, where sequences
is an array of DNA sequences:
sequences.each do |seq|
tmp=seq.scan(/motif1.*motif2/)[0]
outputfile << tmp if tmp
end
The problem is I am getting different number of the extracted sequences. Why?
Upvotes: 3
Views: 136
Reputation: 4551
Ruby's scan
returns an array with the matched regex parts, by default. Grep doesn't do that, it returns the whole line with the match highlighted if color
is set to auto
. To retrieve matched parts only from grep, use the -o
option.
grep -o "motif1.*motif2" inputfile > outputfile
Previous command should save the same output as the ruby's scan does.
Upvotes: 2