MHibbin
MHibbin

Reputation: 1185

Python: Using regex stored in CSV

I am just testing out a small python script of which I will use part in a larger script. Basically I am trying to lookup a field in a CSV file (where it contains a regex), and use this in a regex test. The reason is (part of a very wierd use-case) and will enable easier maintenance of a CSV file instead of the script. Is there something I am missing with the following....

test.csv:

field0,field1,field2
foo,bar,"\d+\.\d+"
bar,foo,"\w+"

test.py (extra print's used for testing):

import sys
import re
import csv

input = sys.argv[1]
print input

reader = csv.reader(open('test.csv','rb'), delimiter=',', quotechar="\"")
for row in reader:
        print row
        value = row[0]
        print value
        if value in input:
                regex = row[2]
                print regex

                pat = re.compile(regex)
                test = re.match(pat,input)
                out = test.group(1)
                print out

If I pass a value like "foo blah 38902462986.328946239846" to the script, I would expect this to pick up that it contains foo and then use the regex, \d+\.\d+, to extract 38902462986.328946239846. However when I run the script I get the following:

foo blah 0920390239.90239029
['field0', 'field1', 'field2']
field0
['foo', 'bar', '\\d+\\.\\d+']
foo
\d+\.\d+
Traceback (most recent call last):
  File "reg.py", line 19, in <module>
    out = test.group(1)
AttributeError: 'NoneType' object has no attribute 'group'

Not sure what's going on really.

P.S Python is a big world and still learning.

Upvotes: 0

Views: 1475

Answers (1)

detunized
detunized

Reputation: 15299

According to the docs re.match matches at the beginning of the input string. You need to use re.search. Also, there's no need to compile if you don't reuse them afterwards. Just say test = re.search(regex, input).

In the regular expressions in your example you don't have any capture groups, so test.group(1) is going to fail, even if there's a match in the input.

import sys
import re
import csv

input = 'foo blah 38902462986.328946239846'

reader = csv.reader(open('test.csv','rb'), delimiter=',', quotechar="\"")
for row in reader:
    value = row[0]
    if value in input:
        regex = row[2]
        test = re.search(regex, input)
        print input[test.start():test.end()]

Prints:

38902462986.328946239846

Upvotes: 1

Related Questions