kwicher
kwicher

Reputation: 2082

Discrepancy between linux grep and Ruby's scan results

I have a list of DNA sequences (one per line):

ACTGCTCGGGGG.....

CGCTCGCTTCTCTC...

etc

Most sequences contain two specific motifs, one close to the begining and one closer to the end. I am extracting the sequences inbetween:

  1. with grep: grep "motif1.*motif2" inputfile > outputfile
  2. in ruby with scan, where sequences is an array of DNA sequences:

     sequences.each do |seq|
      tmp=seq.scan(/motif1.*motif2/)[0]
      outputfile << tmp if tmp
     end
    

The problem is I am getting different number of the extracted sequences. Why?

Upvotes: 3

Views: 136

Answers (1)

ShellFish
ShellFish

Reputation: 4551

Ruby's scan returns an array with the matched regex parts, by default. Grep doesn't do that, it returns the whole line with the match highlighted if color is set to auto. To retrieve matched parts only from , use the -o option.

grep -o "motif1.*motif2" inputfile > outputfile

Previous command should save the same output as the 's scan does.

Upvotes: 2

Related Questions