Reputation: 426
I am trying to create a regex that finds ticker symbols in bodies of text. However it is a bit of a struggle to get one to do everything I need.
Example:
This is a $test to show what I would LIKE to match. If $YOU look below you will FIND the list of simulated tickers ($STOck symbols) I would like to match.
So in this case I would like to match the following from the above:
I am trying to get:
I've tried:
\b[A-Z]{3,6}\b
but that matches pretty much every word\$[^3-6\s]\S*
but that includes the $ and also ignores any ALL CAPS without a dollar signUpvotes: 3
Views: 1033
Reputation: 22032
Would you please try the following:
import re
s = 'This is a $test to show what I would LIKE to match. If $YOU look below you will FIND the list of simulated tickers ($STOck symbols) I would like to match.'
print(re.findall(r'(?<=\$)\w+|[A-Z]{3,6}', s))
Output:
['test', 'LIKE', 'YOU', 'FIND', 'STOck']
(?<=\$)
is a lookbehind assertion which matches a leading dollar sign without including the match in the result.
(Precisely speaking, it matches the boundary just after the dollar sign rather than the character itself.)
Upvotes: 1