Reputation: 63
Lets say the string representing the phrase is "Holy it is changing again and again"
I want to print out the word "changing"
before "again and again"
, but this word may be different every time. So I need to extract the word before the phrase "again and again"
. The phrase "holy it is"
should not be extracted.
How can I do that with Python?
I thought about using Regex like here Python regex to match word before < but I'm not too sure how to code it right.
Upvotes: 0
Views: 151
Reputation: 2603
To match any word followed by "again and again"
, use this regex:
([\w]*) again and again
If you want to include more characters, for example the apostrophe, replace [\w]
with [\w']
, and similarly for other characters inside the square brackets (some requires escaping).
To find all occurrences of that pattern, use
The regex match = re.findall("([\w']*) again and again", phrase)
, where ([\w']*)
is any word (sequence of word characters, including the apostrophe. It returns a list of all the words followed by "again and again".
phrase = "Holy it is changing again and again!"
match = re.findall("([\w']*) again and again", phrase)
# match is ['changing']
phrase = "Going again, going again and again, and finishing again and again!"
match = re.findall("([\w']*) again and again", phrase)
# match is ['going', 'finishing']
phrase = "Defeated again and again! I got ninja'd again and again!"
match = re.findall("([\w']*) again and again", phrase)
# match is ['Defeated', "ninja'd"]
Upvotes: 1
Reputation:
To start off, try this regex: "([Cc]hanging) again and again"
, capturing the (changing)
group. The additional [Cc]
addresses cases where "changing"
is capitalized to "Changing"
.
To use a different word, replace ([Cc]hanging)
with another word. For example, to capture "going"
before "again and again"
, use ([Gg]oing)
instead.
To match multiple different words followed by "again and again"
, including different forms of the word, use union. To match "change"
, "changes"
, "changing"
, "changed"
, "going"
and considering cases where the word is capitalized, the grouped part becomes ([Cc]hange|[Cc]hanges|[Cc]hanging|[Cc]hanged|[Gg]oing)
Upvotes: 1
Reputation: 2263
import re
text = '''
Holy it is changing again and again
Holy it is not changing again and again
Holy it has changed again and again
Holy it has changed once
Holy it used to change again and again
'''
prog = re.compile(r'(\w+) again and again');
for line in text.splitlines():
x = prog.search(line)
if(x): print(x.group(1))
This outputs:
changing
changing
changed
change
Upvotes: 0