Reputation: 7790
I have a string that looks like this:
>Bounded_RNA_of:1DDL:Elength : 1
Regex wise it can be formed this way:
>Bounded_RNA_of:(\w+):(\w)length : 1
At the end of the day what I want to extract is just 1DDL
and E
.
But why this regex failed?
import re
seq=">Bounded_RNA_of:1DDL:Elength : 1"
match = re.search(r'(>Bounded_RNA_of:(\w+):(\w)length : 1)',seq)
print match.group()
# prints this:
# >Bounded_RNA_of:1DDL:Elength : 1
What's the way to do it?
Upvotes: 1
Views: 1652
Reputation: 41838
Others have already answered, but I'd like to suggest a more precise regex for the task:
import re
subject = ">Bounded_RNA_of:1DDL:Elength : 1"
match = re.search(r">\w+:([^:]+):(\w)", subject)
if match:
print match.group(1)
print match.group(2)
Regex Explanation
>
serves as an anchor that helps the engine know we are looking in the right place. It helps prevent backtracking later.\w+:
matches what comes before the first colon :
([^:]+)
captures any chars that are not a :
to Group 1.:
(\w)
captures the remaining character to Group 2.Upvotes: 0
Reputation: 3880
>>> match = re.search(r'>Bounded_RNA_of:(\w+):(\w)length : 1',seq)
>>> print match.group(1,2)
('1DDL', 'E')
Upvotes: 1
Reputation: 163
Don't use parenthesis in:
match = re.search(r'(>Bounded_RNA_of:(\w+):(\w)length : 1)',seq)
It should be:
match = re.search(r'>Bounded_RNA_of:(\w+):(\w)length : 1',seq)
And then you can extract 1DDL and E with:
print match.group(1)
print match.group(2)
EDIT: If you want to keep this parenthesis you can extract info with:
print match.group(2)
print match.group(3)
Upvotes: 0
Reputation: 15328
This is due to the global catching parenthesis, you should catch only the two needed elements.
import re
seq=">Bounded_RNA_of:1DDL:Elength : 1"
match = re.search(r'>Bounded_RNA_of:(\w+):(\w)length : 1',seq)
print match.group(1), match.group(2)
Upvotes: 3
Reputation: 53525
Simply print:
print match.group(2)
print match.group(3)
OUTPUT
1DDL
E
Upvotes: 1