Simon Brunner
Simon Brunner

Reputation: 395

Reg ex - find part of string

I've got data of this type (repeated many times):

@@@FFDFFHHHHHJJFFHGIJJJJGI   
@M00332:5:000000000-A0TVJ:1:1:13498:26189 2:N:0:1   
ACCACAGCCGCTGCCCATTTGCATAA 
+

Using regexp I'm trying to select all lines which contain a specific string cagccgctgcccatttg. I'm a regex newbie, so I've tried this: \w{3,}(cagccgctgcccatttg)\w{3,}

Any help is much appreciated.

Cheers Simon

Upvotes: 1

Views: 82

Answers (1)

VolatileRig
VolatileRig

Reputation: 2847

From what I understand, you want to gather all sequences which contain a single sub-sequence. I don't know what environment you're using, but this should return any sequence you're looking for in a very simple way.

([ACGT]{3,}CAGCCGCTGCCCATTTG[ACGT]{3,})

The brackets are a character class, meaning it matches any single character inside. You don't want to match \w, you only want to match a character if it's one of the 4 you're looking for. Also, you can use parens to cover the whole regex to pick up the entire match.

Upvotes: 3

Related Questions