xvienxz2
xvienxz2

Reputation: 63

Finding string before certain phrase

Lets say the string representing the phrase is "Holy it is changing again and again"

I want to print out the word "changing" before "again and again", but this word may be different every time. So I need to extract the word before the phrase "again and again". The phrase "holy it is" should not be extracted.

How can I do that with Python?

I thought about using Regex like here Python regex to match word before < but I'm not too sure how to code it right.

Upvotes: 0

Views: 151

Answers (3)

To match any word followed by "again and again", use this regex:

  • ([\w]*) again and again

If you want to include more characters, for example the apostrophe, replace [\w] with [\w'], and similarly for other characters inside the square brackets (some requires escaping).

  • Holy it is changing again and again!
  • We are going to play again, and play again and again!
  • OMG again and again!
  • Let's go again and again. Again and again we go!
  • I got roomba'd again and again (requires adding ')
  • Foo became A-B-C again and again, Bar and Baz. (requires adding the escaped hyphen)
  • More sample regexes!

To find all occurrences of that pattern, use

The regex match = re.findall("([\w']*) again and again", phrase), where ([\w']*) is any word (sequence of word characters, including the apostrophe. It returns a list of all the words followed by "again and again".

phrase = "Holy it is changing again and again!"
match = re.findall("([\w']*) again and again", phrase)
# match is ['changing']

phrase = "Going again, going again and again, and finishing again and again!"
match = re.findall("([\w']*) again and again", phrase)
# match is ['going', 'finishing']

phrase = "Defeated again and again! I got ninja'd again and again!"
match = re.findall("([\w']*) again and again", phrase)
# match is ['Defeated', "ninja'd"]

Upvotes: 1

anon
anon

Reputation:

To start off, try this regex: "([Cc]hanging) again and again", capturing the (changing) group. The additional [Cc] addresses cases where "changing" is capitalized to "Changing".

  • Holy it is changing again and again!
  • It is changing again and again, and it still changes
  • I am changing again and again and again, and still changing again and again!
  • Changing again and again and changing again and again!
  • Some more sample regexes

To use a different word, replace ([Cc]hanging) with another word. For example, to capture "going" before "again and again", use ([Gg]oing) instead.

  • We are going again and again and again!
  • Going again and again after multiple warnings will get you banned!
  • Going again and again, and going again and again, but going around in circles.
  • Some more sample regexes

To match multiple different words followed by "again and again", including different forms of the word, use union. To match "change", "changes", "changing", "changed", "going" and considering cases where the word is capitalized, the grouped part becomes ([Cc]hange|[Cc]hanges|[Cc]hanging|[Cc]hanged|[Gg]oing)

  • Holy it changed again and again!
  • It is changing again and again. Changes again and again still!
  • My score changes again and again, but now my score does not change or going anywhere!
  • Change again and again and again, just stop the change.
  • We are going and changing again and again and again!
  • Some more sample regexes

Upvotes: 1

keithpjolley
keithpjolley

Reputation: 2263

import re

text = '''

Holy it is changing again and again
Holy it is not changing again and again
Holy it has changed again and again
Holy it has changed once
Holy it used to change again and again
'''

prog = re.compile(r'(\w+) again and again');
for line in text.splitlines():
  x = prog.search(line)
  if(x): print(x.group(1))

This outputs:

changing
changing
changed
change

Upvotes: 0

Related Questions