Dan
Dan

Reputation: 4663

Python RegEx Woes

I'm not sure why this isn't working:

import re
import csv

def check(q, s):
  match = re.search(r'%s' % q, s, re.IGNORECASE)
  if match:
    return True
  else:
    return False

tstr = []

# test strings
tstr.append('testthisisnotworking')
tstr.append('This is a TEsT')
tstr.append('This is a    TEST    mon!')

f = open('testwords.txt', 'rU')
reader = csv.reader(f)
for type, term, exp in reader:
  for i in range(2):
    if check(exp, tstr[i]):
      print exp + " hit on " + tstr[i]
    else:
      print exp + " did NOT hit on " + tstr[i]
f.close()

testwords.txt contains this line:

blah, blah, test

So essentially 'test' is the RegEx pattern. Nothing complex, just a simple word. Here's the output:

test did NOT hit on testthisisnotworking
test hit on This is a TEsT
test hit on This is a    TEST    mon!

Why does it NOT hit on the first string? I also tried \s*test\s* with no luck. Help?

Upvotes: 1

Views: 95

Answers (2)

Andrew Clark
Andrew Clark

Reputation: 208425

Adding a print repr(exp) to the top of the first for loop shows that exp is ' test', note the leading space.

This isn't that surprising since csv.reader() splits on commas, try changing your code to the following:

for type, term, exp in reader:
  exp = exp.strip()
  for s in tstr:
    if check(exp, s):
      print exp + " hit on " + s
    else:
      print exp + " did NOT hit on " + s

Note that in addition to the strip() call which will remove the leading a trailing whitespace, I change your second for loop to just loop directly over the strings in tstr instead of over a range. There was actually a bug in your current code because tstr contained three values but you only checked the first two because for i in range(2) will only give you i=0 and i=1.

Upvotes: 3

Greg Hewgill
Greg Hewgill

Reputation: 992857

The csv module by default returns blank spaces around words in the input (this can be changed by using a different "dialect"). So exp contains " test" with a leading space.

A quick way to fix this would be to add:

exp = exp.strip()

after you read from the CSV file.

Upvotes: 6

Related Questions