Reputation: 4077
I have got a list of sentences:
[ 'home twn cafe nr link rd',
'taj lands ends hotel..',
'SILVER PALACE705BPALI MALA ROADBANDRA WEST',
'turner rd lemon rd 4 fountain pali rd junctio...',
' FLAT 657 FLOOR AIR INDIA APTS 61B PALI HILL',
'bungalow 9 Mt Mary Bandra West',
'shabbir apt charklie rajan rd abv icici ban...',
'st peters church backyard loun hill rd',
'Union Park Road ',
'Flat 32 Building No 8',
'mehboob studio',
'ONGC Colony',
'Nargis Dutt Road Grand Canyon Building Appa']
I need to use re.findall to find all words with 'rd', and replace them with 'road'. I tried this :
data2 = [nltk.sent_tokenize(lines) for lines in data]
c = [re.findall('nr',sent) for sent in data2]
and I got this error :
TypeError: expected string or buffer
how do I use re.findall
in an iterative statement? dunno how to convert to string.. plz help
Upvotes: 1
Views: 1424
Reputation: 239683
I would use a simple RegEx and list comprehension like this
import re
pattern = re.compile(r"\brd\b")
print [pattern.sub("road", line) for line in data]
Output
['home twn cafe nr link road',
'taj lands ends hotel..',
'SILVER PALACE705BPALI MALA ROADBANDRA WEST',
'turner road lemon road 4 fountain pali road junctio...',
' FLAT 657 FLOOR AIR INDIA APTS 61B PALI HILL',
'bungalow 9 Mt Mary Bandra West',
'shabbir apt charklie rajan road abv icici ban...',
'st peters church backyard loun hill road',
'Union Park Road ',
'Flat 32 Building No 8',
'mehboob studio',
'ONGC Colony',
'Nargis Dutt Road Grand Canyon Building Appa']
Upvotes: 3