CodeConfusion
CodeConfusion

Reputation: 41

Using Regex to capture phrase

My question is regarding the following tweets:

Credit Suisse Trims Randgold Resources Limited (RRS) Target Price to GBX

JPMorgan Chase & Co Trims Occidental Petroleum Co (OXY) Target Price to

I want to remove "Randgold Resources Limited (RRS)" from the first tweet and "Occidental Petroleum Co (OXY)" from the second tweet using Regex.

I am working in Python and so far I have tried this without much luck:

Trims\s[\w\s.()]+(?=Target)

I want to capture the phrase "Trims Target Price" in both instances. Help would be appreciated.

Upvotes: 0

Views: 85

Answers (3)

Kenly
Kenly

Reputation: 26768

The (?<=...) Lookbehind assertion, match if preceded is missing for Trims word.

re.sub('(?<=Trims)\s[\w\s.()]+(?=Target)', ' ', text)

Upvotes: 0

ndnenkov
ndnenkov

Reputation: 36110

(?<=Trims )([A-Z][a-z]+ ){3}\([A-Z]{3}\)

See it in action


The idea is:

  • (?<=Trims ) - find a place preceded by Trims using positive lookbehind
  • [A-Z][a-z]+ - a word starting with capital letter that continues with multiple lower case letters
  • ([A-Z][a-z]+ ){3} - three such words followed by space
  • \( and \) - brackets have to be escaped, otherwise they have the meaning of capturing group
  • [A-Z]{3} - three capital letters

Upvotes: 1

anubhava
anubhava

Reputation: 786291

You can use this lookaround based regex:

p = re.compile(r'(?<= Trims) .*?(?= Target )')      
result = re.sub(p, "", test_str)

(?<= Trims) .*?(?= Target ) will match any text that is between Trim and Target.

RegEx Demo

Upvotes: 1

Related Questions