Having trouble with re and matching groups

Question

I'm just pulling my hair out with python regexp's.

I have a string which contains the multi-line output from an os command.

One such line will contain a string like this following:

2015/04/13.16:26:07 156.0 GB of instance data copied, dev_iosecs 1887, dev_iorate 88.8 MB/s

I am wanting to parse out "156.0 GB" into two matching groups. This field can also contain TB, MB, KB and possibly even just byes but for now I just wanna focus on TB, MB and KB and I'll deal with the potential scenario where it's just bytes later if it arises.

    if self.type == "cpinstance":
        if re.search("of instance data copied", line):
            m = re.match("(?P\d[.][\d]) (?PTB|GB|MB|KB) of instance data copied", line)
            print m.group('datasize'), m.group('units')
            if m.group('units') == "GB":
                print "MATCH!!!!!"

I've tried scores of permutations of regexps and can't for the life of me get m.group to ever work.

Traceback (most recent call last):
  File "./listInstances.py", line 187, in 
    tscript = OSBTranscript(image.jobid)
  File "/devel/REPO/PYLIB/osb.py", line 833, in __init__
    print m.group('datasize'), m.group('units')
AttributeError: 'NoneType' object has no attribute 'group'

I'm sure it's something stupid staring me right in the face but currently eluding me. =p

Thanks for any help.

Kevin · Accepted Answer

match always starts at the beginning of the line, so it will fail when it sees the date and time section. Try using search instead of match.

import re

line = "2015/04/13.16:26:07 156.0 GB of instance data copied, dev_iosecs 1887, dev_iorate 88.8 MB/s"

if re.search("of instance data copied", line):
    m = re.search("(?P\d[.][\d]) (?PTB|GB|MB|KB) of instance data copied", line)
    print m.group('datasize'), m.group('units')
    if m.group('units') == "GB":
        print "MATCH!!!!!"

Result:

6.0 GB
MATCH!!!!!

Good start, but it only matches one digit before the decimal point. try putting a star after your \d. (or perhaps a plus, depending on whether you want to find numbers like ".5".)

import re

line = "2015/04/13.16:26:07 156.0 GB of instance data copied, dev_iosecs 1887, dev_iorate 88.8 MB/s"

if re.search("of instance data copied", line):
    m = re.search("(?P\d*[.][\d]) (?PTB|GB|MB|KB) of instance data copied", line)
    print m.group('datasize'), m.group('units')
    if m.group('units') == "GB":
        print "MATCH!!!!!"

Result:

156.0 GB
MATCH!!!!!

Having trouble with re and matching groups

Answers (2)

Related Questions