MadManoloz
MadManoloz

Reputation: 73

Locate and extract a piece of string that contains a keyword from text in python

I am making a bot which is looking through many comments and I want to locate any sentence that starts with "I'm" or "I am". Here is an example comment( that has two sentences I want to extract).

"Oh, in that case. I'm sorry. I'm sure everyone's day will come, it's just a matter of time."  

Here is the function that I have up to now.

keywords = ["i'm ","im ","i am "]

def get_quote(comments):
    quotes = []
    for comment in comments:
        isMatch = any(string in comment.text.lower() for string in keywords)
        if isMatch:

How can I locate where the sentence starts and ends so I can .append it to the list quotes?

Upvotes: 2

Views: 1028

Answers (2)

Dinesh Pundkar
Dinesh Pundkar

Reputation: 4194

Check if this code works for you

def get_quote(comments):
    keywords = ["i'm ","im ","i am "]
    quotes = []
    for comment in comments:
        isMatch = any(string in comment.lower() for string in keywords)
        if isMatch:
            quotes.append(comment)
    print "Lines having keywords are "
    for q in quotes:
        print q


if __name__ == "__main__":
    a="Oh, in that case. I'm sorry. I'm sure everyone's day will come, it's just a matter of time."
    #Removed last "." from line before splitting on basis of "."
    a = a.rstrip(".")
    list_val = a.split(".")
    get_quote(list_val)

Output:

C:\Users\Administrator\Desktop>python demo.py
Lines having keywords are
 I'm sorry
 I'm sure everyone's day will come, it's just a matter of time

C:\Users\Administrator\Desktop>

Upvotes: 2

tobias_k
tobias_k

Reputation: 82949

You can use regular expressions for this:

>>> import re
>>> text = "Oh, in that case. I'm sorry. I'm sure everyone's day will come, it's just a matter of time." 
>>> re.findall(r"(?i)(?:i'm|i am).*?[.?!]", text)
["I'm sorry.",
 "I'm sure everyone's day will come, it's just a matter of time."]

The pattern I use here is r"(?i)(?:i'm|i am).*?[.?!]"

  • (?i) set flag "ignore case"
  • (?:i'm|i am) "i'm" or (|) "i am", ?: means non-capturing group
  • .*? non-greedily (?) matches a sequence (*) of any characters (.) ...
  • [.?!] ... until finding a literal dot, question mark or exclamation mark.

Note that this will only work if there are no "other" dots, i.e. as in "Dr." or "Mr.", as those, too, will be treated as end-of-sentence.

Upvotes: 6

Related Questions