Reputation: 181
I am using regex to match names proceeding "Dr. ". However, when I print the matches, they print as lists, and some are empty. I'm looking to print just the names. code:
import re
f = open('qwert.txt', 'r')
lines = f.readlines()
for x in lines:
p=re.findall(r'(?:Dr[.](\w+))',x)
q=re.findall(r'(?:As (\w+))',x)
print p
print q
qwert.txt:
Dr.John and Dr.Keel
Dr.Tensa
Dr.Jees
As John winning Nobel prize
As Mary wins all prize
car
tick me 3
python.hi=is good
dynamic
and precise
tickme 2 and its in it
its rapid
its best
well and easy
desired output:
John
Keel
Tensa
Jees
John
Mary
actual output:
['John', 'Keel']
[]
['Tensa']
[]
['Jees']
[]
[]
['John']
[]
['Mary']
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
Upvotes: 1
Views: 2472
Reputation: 70732
You need to iterate through your results.
Consider using findall()
once so it doesn't have to be repeated on each iteration.
>>> import re
>>> f = open('qwert.txt', 'r')
>>> for line in f:
... matches = re.findall(r'(?:Dr\.|As )(\w+)', line)
... for x in matches:
... print x
John
Keel
Tensa
Jees
John
Mary
Upvotes: 1
Reputation: 85785
Simply test against the result of findall
before printing:
import re
with open('qwert.txt', 'r') as fh:
for line in fh:
res = re.findall(r'(?:Dr[.](\w+))', line)
if res:
print '\n'.join(res)
res = re.findall(r'(?:As (\w+))', line)
if res:
print '\n'.join(res)
This won't scale nicely if the number of regular expression is more than a couple. A more useful approach maybe:
import re
from functools import partial
def parseNames(regexs, line):
"""
Returns a newline seperated string of matches given a
list or regular expressions and a string to search
"""
res = ""
for regex in regexs:
res += '\n'.join(re.findall(regex, line))
return res
regexs = [r'(?:Dr[.](\w+))', r'(?:As (\w+))']
match = partial(parseNames, regexs)
with open('qwert.txt', 'r') as fh:
names = map(match, fh.readlines())
print '\n'.join(filter(None, names))
Output:
John
Keel
Tensa
Jees
John
Mary
Upvotes: 2
Reputation: 1121764
re.findall()
always returns a list of matches, and that list can be empty. Loop over the result and print each element separately:
p = re.findall(r'(?:Dr[.](\w+))', x)
for match in p:
print match
q = re.findall(r'(?:As (\w+))', x)
for match in q:
print q
Empty lists mean nothing will be printed.
You could even do:
for match in re.findall(r'(?:Dr[.](\w+))', x):
print match
for match in re.findall(r'(?:As (\w+))', x):
print q
and forgo the use of the p
and q
variables.
Last but not least, you can combine the regular expressions into one:
for match in re.findall(r'(?:Dr\.|As )(\w+)', x):
print match
Demo:
>>> import re
>>> lines = '''\
... Dr.John and Dr.Keel
... Dr.Tensa
... Dr.Jees
... As John winning Nobel prize
... As Mary wins all prize
... car
... tick me 3
... python.hi=is good
... dynamic
... and precise
...
... tickme 2 and its in it
... its rapid
... its best
... well and easy
... '''.splitlines(True)
>>> for x in lines:
... for match in re.findall(r'(?:Dr\.|As )(\w+)', x):
... print match
...
John
Keel
Tensa
Jees
John
Mary
Upvotes: 2
Reputation: 2576
The []
you see are because findAll
returns a list
of strings. If you need the strings themselves, iterate over the result of findAll.
p=re.findall(r'(?:Dr[.](\w+))',x)
q=re.findall(r'(?:As (\w+))',x)
for str in p+q:
print str
Upvotes: 2