user175259
user175259

Reputation: 4921

Creating Regular Expressions in Python

I'm trying to create regular expression that filters from the following partial text:

amd64 build of software 1:0.98.10-0.2svn20090909 in archive

what I want to extract is:

software 1:0.98.10-0.2svn20090909

How can I do this?? I've been trying and this is what I have so far:

p = re.compile('([a-zA-Z0-9\-\+\.]+)\ ([0-9\:\.\-]+)')
iterator = p.finditer("amd64 build of software 1:0.98.10-0.2svn20090909 in archive")
for match in iterator:
    print match.group()

with result:

software 1:0.98.10-0.2

(svn20090909 is missing)

Thanks a lot.

Upvotes: 0

Views: 223

Answers (3)

Escualo
Escualo

Reputation: 42082

If you have consistent lines, this is, if each entry is on one line and the first word you want is always before the numbers part (the 1:0.98 ... part) you don't need a regexp. Try this:

>>> s = 'amd64 build of software 1:0.98.10-0.2svn20090909 in archive'
>>> match = [s.split()[3], s.split()[4]]
>>> print match
['software', '1:0.98.10-0.2svn20090909']
>>> # alternatively
>>> match = s.split()[3:5] # for same result

what this is doing is the following: it first splits the line s at the spaces (using the string method split()) and selects the fourth and fifth elements of the resulting list; both are stored in the variable match.

Again , this only works if you have one entry per line and if the 'software' part always comes before the 1:0.98.10-0.2svn20090909 part.

I often avoid regexps when I can do with split lists. If the parsing becomes a nightmare, I use pyparsing.

Upvotes: 3

RichieHindle
RichieHindle

Reputation: 281495

This will work:

p = re.compile(r'([a-zA-Z0-9\-\+\.]+)\ ([0-9][0-9a-zA-Z\:\.\-]+)')
iterator = p.finditer("amd64 build of dvdrip software 1:0.98.10-0.2svn20090909 in archive")
for match in iterator:
    print match.group()
# Prints: software 1:0.98.10-0.2svn20090909

That works by allowing the captured section to contain letters while still insisting that it starts with a number.

Without seeing all the other strings it needs to match, I can't be sure whether that's good enough.

Upvotes: 3

Azeem.Butt
Azeem.Butt

Reputation: 5861

Don't use a capturing group if you want everything in one piece.

Upvotes: 0

Related Questions