pdubois
pdubois

Reputation: 7790

Capturing subset of a string using Python's regex

I have a string that looks like this:

>Bounded_RNA_of:1DDL:Elength : 1

Regex wise it can be formed this way:

>Bounded_RNA_of:(\w+):(\w)length : 1

At the end of the day what I want to extract is just 1DDL and E.

But why this regex failed?

import re
seq=">Bounded_RNA_of:1DDL:Elength : 1"
match = re.search(r'(>Bounded_RNA_of:(\w+):(\w)length : 1)',seq)
print match.group()

# prints this:
# >Bounded_RNA_of:1DDL:Elength : 1

What's the way to do it?

Upvotes: 1

Views: 1652

Answers (5)

zx81
zx81

Reputation: 41838

Others have already answered, but I'd like to suggest a more precise regex for the task:

import re
subject = ">Bounded_RNA_of:1DDL:Elength : 1"
match = re.search(r">\w+:([^:]+):(\w)", subject)
if match:
    print match.group(1)
    print match.group(2)

Regex Explanation

  • The > serves as an anchor that helps the engine know we are looking in the right place. It helps prevent backtracking later.
  • The \w+: matches what comes before the first colon :
  • The ([^:]+) captures any chars that are not a : to Group 1.
  • Then we match the second :
  • (\w) captures the remaining character to Group 2.

Upvotes: 0

Suku
Suku

Reputation: 3880

>>> match = re.search(r'>Bounded_RNA_of:(\w+):(\w)length : 1',seq)
>>> print match.group(1,2)
('1DDL', 'E')

Upvotes: 1

KamilD
KamilD

Reputation: 163

Don't use parenthesis in:

match = re.search(r'(>Bounded_RNA_of:(\w+):(\w)length : 1)',seq)

It should be:

match = re.search(r'>Bounded_RNA_of:(\w+):(\w)length : 1',seq)

And then you can extract 1DDL and E with:

print match.group(1)
print match.group(2)

EDIT: If you want to keep this parenthesis you can extract info with:

print match.group(2)
print match.group(3)

Upvotes: 0

daouzli
daouzli

Reputation: 15328

This is due to the global catching parenthesis, you should catch only the two needed elements.

import re
seq=">Bounded_RNA_of:1DDL:Elength : 1"
match = re.search(r'>Bounded_RNA_of:(\w+):(\w)length : 1',seq)
print match.group(1), match.group(2)

Upvotes: 3

Nir Alfasi
Nir Alfasi

Reputation: 53525

Simply print:

print match.group(2)
print match.group(3)

OUTPUT

1DDL
E

Upvotes: 1

Related Questions