Hypothetical Ninja
Hypothetical Ninja

Reputation: 4077

re.findall on each sentence of a list

I have got a list of sentences:

[                              'home twn cafe nr link rd',  
                                'taj lands ends hotel..',  
            'SILVER PALACE705BPALI MALA ROADBANDRA WEST',  
     'turner rd lemon rd 4 fountain  pali rd junctio...',    
      ' FLAT 657 FLOOR AIR INDIA APTS 61B PALI HILL',  
                        'bungalow 9 Mt Mary Bandra West',  
     'shabbir apt charklie rajan rd abv icici ban...',  
                'st peters church backyard loun hill rd',  
                                       'Union Park Road ', 
                                 'Flat 32 Building No 8',  
                                       'mehboob studio',  
                                          'ONGC Colony',  
'Nargis Dutt Road Grand Canyon Building Appa']

I need to use re.findall to find all words with 'rd', and replace them with 'road'. I tried this :

data2 = [nltk.sent_tokenize(lines) for lines in data]  
c = [re.findall('nr',sent) for sent in data2]

and I got this error :

TypeError: expected string or buffer

how do I use re.findall in an iterative statement? dunno how to convert to string.. plz help

Upvotes: 1

Views: 1424

Answers (1)

thefourtheye
thefourtheye

Reputation: 239683

I would use a simple RegEx and list comprehension like this

import re
pattern = re.compile(r"\brd\b")
print [pattern.sub("road", line) for line in data]

Output

['home twn cafe nr link road',
 'taj lands ends hotel..',
 'SILVER PALACE705BPALI MALA ROADBANDRA WEST',
 'turner road lemon road 4 fountain  pali road junctio...',
 ' FLAT 657 FLOOR AIR INDIA APTS 61B PALI HILL',
 'bungalow 9 Mt Mary Bandra West',
 'shabbir apt charklie rajan road abv icici ban...',
 'st peters church backyard loun hill road',
 'Union Park Road ',
 'Flat 32 Building No 8',
 'mehboob studio',
 'ONGC Colony',
 'Nargis Dutt Road Grand Canyon Building Appa']

Upvotes: 3

Related Questions