Reputation: 41
My question is regarding the following tweets:
Credit Suisse Trims Randgold Resources Limited (RRS) Target Price to GBX
JPMorgan Chase & Co Trims Occidental Petroleum Co (OXY) Target Price to
I want to remove "Randgold Resources Limited (RRS)" from the first tweet and "Occidental Petroleum Co (OXY)" from the second tweet using Regex.
I am working in Python and so far I have tried this without much luck:
Trims\s[\w\s.()]+(?=Target)
I want to capture the phrase "Trims Target Price" in both instances. Help would be appreciated.
Upvotes: 0
Views: 85
Reputation: 26768
The (?<=...) Lookbehind assertion, match if preceded
is missing for Trims
word.
re.sub('(?<=Trims)\s[\w\s.()]+(?=Target)', ' ', text)
Upvotes: 0
Reputation: 36110
(?<=Trims )([A-Z][a-z]+ ){3}\([A-Z]{3}\)
(?<=Trims )
- find a place preceded by Trims
using positive lookbehind[A-Z][a-z]+
- a word starting with capital letter that continues with multiple lower case letters([A-Z][a-z]+ ){3}
- three such words followed by space\(
and \)
- brackets have to be escaped, otherwise they have the meaning of capturing group[A-Z]{3}
- three capital lettersUpvotes: 1
Reputation: 786291
You can use this lookaround based regex:
p = re.compile(r'(?<= Trims) .*?(?= Target )')
result = re.sub(p, "", test_str)
(?<= Trims) .*?(?= Target )
will match any text that is between Trim
and Target
.
Upvotes: 1