Matching an expression using ReGex ,Python

Question

I have many sentences , though i'd create a function that would operate on each sentence individually. so the input is just a string. My main objective is to extract the words that follow prepositions like in "near blue meadows" i'd want blue meadows to be extracted.
I have all my prepositions in a text file. it works fine but i guess there's a problem in the regex used . here's my code: import re

with open("Input.txt") as f:
    words = "|".join(line.rstrip() for line in f)
    pattern = re.compile('({})\s(\d+\w+|\w+)\s\w+'.format(words))
    text3 = "003 canopy grace appt, classic royale garden, hennur main road, bangalore 43. near hennur police station"
    print(pattern.search(text3).group())

This returns :

AttributeError                            Traceback (most recent call last)
 in ()
      5     pattern = re.compile('({})\s(\d+\w+|\w+)\s\w+'.format(words))
      6     text3 = ""
----> 7     print(pattern.search(text3).group())

AttributeError: 'NoneType' object has no attribute 'group

The main problem is with regex , my expected output is "hennur police" i.e 2 words after near . In my code I have used ({}) to match from the list of preps, \s followed by space , (\d+\w+|\w+) followed by words like 19th or hennur , \s\w+ followed by a space and a word. My regex fails to match , hence the None error. Why is it not working?

The content of the Input.txt file:

['near','nr','opp','opposite','behind','towards','above','off']

Expected output:

hennur police

falsetru · Accepted Answer

The file contains Python list literal. Use ast.literal to parse the literal.

>>> import ast
>>> ast.literal_eval("['near','nr','opp','opposite','behind','towards','above','off']")
['near', 'nr', 'opp', 'opposite', 'behind', 'towards', 'above', 'off']

import ast
import re

with open("Input.txt") as f:
    words = '|'.join(ast.literal_eval(f.read()))
    pattern = re.compile('(?:{})\s(\d*\w+\s\w+)'.format(words))
    text3 = "003 canopy grace appt, classic royale garden, hennur main road, bangalore 43. near hennur police station"

    # If there could be multiple matches, use `findall` or `finditer`
    #   `findall` returns a list of list if there's capturing group instead of
    #   entire matched string.
    for place in pattern.findall(text3):
        print(place)

    # If you want to get only the first match, use `search`.
    #   You need to use `group(1)` to get only group 1.
    print pattern.search(text3).group(1)

output (The first line is printed in for loop, the second one come from search(..).group(1)):

hennur police
hennur police

NOTE you need to re.escape each word if there's any special character in the word that has special meaning in regular expression.

Matching an expression using ReGex ,Python

Answers (1)

Related Questions