user1647556
user1647556

Reputation:

Regex Python Data Manipulation -- NoneType Object

I have a .txt file with data in the following format:

pq1000007 35 2 237493054 0.013328573 

I am trying to use regex that will capture the first, third, and last number, but only if the last number is greater than .4. For some reason, I get the message that "NoneType object has no attribute 'group'". Any ideas?

Code:

InFileName = "PerkQP_CHGV_SCZ.txt"
InFile = open(InFileName, 'r')

OutFileName='PAZ_OUT' + ".txt"
OutFile=open(OutFileName, 'w')


for Line in InFile:
    match = re.search('(\w+)\s\d+\s(\d+)\s\d+\d+\s(\d+\.\d+)', Line)
    if match.group(2) > 0.4:
        c = match.group()
        print(c)
        OutFile.write(c+"\n")

InFile.close()
OutFile.close()

Upvotes: 1

Views: 223

Answers (2)

Tim Pietzcker
Tim Pietzcker

Reputation: 336378

A few problems:

A regex match is a string, so you can't meaningfully compare it with a float (in fact, in Python 3, it's illegal to do so). In Python 2, any string will always compare greater than a float (because "str" in ASCII is higher than "float". Yes, this rule makes no sense. Good that Python 3 did away with it).

Then, the last number in that regex is in the third capturing group, so you'd need to do

if float(match.group(3)) > 0.4:

Then, you should use a verbatim string (r"...") with your regex.

Finally, \d+\d+ is of course redundant, \d+ will do.

match = re.search(r'(\w+)\s\d+\s(\d+)\s\d+\s(\d+\.\d+)', Line)

This regex matches the example line you gave it, so your error message (which indicates a non-match) must have a different origin. Perhaps there is a line somewhere in your file that does not match the regex. In that case, you could structure your program like this:

for Line in InFile:
    match = re.search(r'(\w+)\s\d+\s(\d+)\s\d+\s(\d+\.\d+)', Line)
    if match:
        if float(match.group(3)) > 0.4:
            # do stuff
        else:
            print "No match: ", Line

Upvotes: 1

BrenBarn
BrenBarn

Reputation: 251428

If the result of the search is None, that means your regex is not matching. It seems to work for the example you give, but perhaps your actual data in the file doesn't match the pattern. (Also, I see that your regex contains \d+\d+ which should just be \d+.)

In addition the match.group returns a string. You need to convert that to a number (with e.g., float(match.group(2)) to compare it to the number 0.4.

Upvotes: 1

Related Questions