adarshram
adarshram

Reputation: 181

python printing regex matches produces empty lists

I am using regex to match names proceeding "Dr. ". However, when I print the matches, they print as lists, and some are empty. I'm looking to print just the names. code:

import re

f = open('qwert.txt', 'r')

lines = f.readlines()
for x in lines:
       p=re.findall(r'(?:Dr[.](\w+))',x)
       q=re.findall(r'(?:As (\w+))',x)
       print p
       print q

qwert.txt:

Dr.John and Dr.Keel
Dr.Tensa
Dr.Jees
As John winning Nobel prize
As Mary wins all prize
car
 tick me 3
 python.hi=is good
 dynamic 
 and precise

tickme 2 and its in it
 its rapid  
 its best
 well and easy

desired output:

John
Keel
Tensa
Jees
John
Mary

actual output:

['John', 'Keel']
[]
['Tensa']
[]
['Jees']
[]
[]
['John']
[]
['Mary']
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]

Upvotes: 1

Views: 2472

Answers (4)

hwnd
hwnd

Reputation: 70732

You need to iterate through your results.

Consider using findall() once so it doesn't have to be repeated on each iteration.

>>> import re
>>> f = open('qwert.txt', 'r')
>>> for line in f:
...     matches = re.findall(r'(?:Dr\.|As )(\w+)', line)
...     for x in matches:
...         print x

John
Keel
Tensa
Jees
John
Mary

Upvotes: 1

Chris Seymour
Chris Seymour

Reputation: 85785

Simply test against the result of findall before printing:

import re

with open('qwert.txt', 'r') as fh:
    for line in fh:
        res = re.findall(r'(?:Dr[.](\w+))', line)
        if res: 
            print '\n'.join(res)
        res = re.findall(r'(?:As (\w+))', line)
        if res:
            print '\n'.join(res)

This won't scale nicely if the number of regular expression is more than a couple. A more useful approach maybe:

import re 
from functools import partial


def parseNames(regexs, line):
    """
    Returns a newline seperated string of matches given a 
    list or regular expressions and a string to search
    """
    res = ""
    for regex in regexs:
        res += '\n'.join(re.findall(regex, line))
    return res


regexs = [r'(?:Dr[.](\w+))', r'(?:As (\w+))'] 
match = partial(parseNames, regexs)

with open('qwert.txt', 'r') as fh:
    names = map(match, fh.readlines())
    print '\n'.join(filter(None, names))

Output:

John
Keel
Tensa
Jees
John
Mary

Upvotes: 2

Martijn Pieters
Martijn Pieters

Reputation: 1121764

re.findall() always returns a list of matches, and that list can be empty. Loop over the result and print each element separately:

p = re.findall(r'(?:Dr[.](\w+))', x)
for match in p:
    print match
q = re.findall(r'(?:As (\w+))', x)
for match in q:
    print q

Empty lists mean nothing will be printed.

You could even do:

for match in re.findall(r'(?:Dr[.](\w+))', x):
    print match
for match in re.findall(r'(?:As (\w+))', x):
    print q

and forgo the use of the p and q variables.

Last but not least, you can combine the regular expressions into one:

for match in re.findall(r'(?:Dr\.|As )(\w+)', x):
    print match

Demo:

>>> import re
>>> lines = '''\
... Dr.John and Dr.Keel
... Dr.Tensa
... Dr.Jees
... As John winning Nobel prize
... As Mary wins all prize
... car
...  tick me 3
...  python.hi=is good
...  dynamic 
...  and precise
... 
... tickme 2 and its in it
...  its rapid  
...  its best
...  well and easy
... '''.splitlines(True)
>>> for x in lines:
...     for match in re.findall(r'(?:Dr\.|As )(\w+)', x):
...         print match
... 
John
Keel
Tensa
Jees
John
Mary

Upvotes: 2

aa333
aa333

Reputation: 2576

The [] you see are because findAll returns a list of strings. If you need the strings themselves, iterate over the result of findAll.

p=re.findall(r'(?:Dr[.](\w+))',x)
q=re.findall(r'(?:As (\w+))',x)
for str in p+q:
  print str

Upvotes: 2

Related Questions