Reputation: 51
I'm having trouble matching stock tickers in a string of text. I want a regular expression to match a space , 3 uppercase letters, and finally a space, period, OR question mark.
Below is the sample pattern that I created.
> `example = 'These are the tickers that I am trying to find: FAB. APL APL? GJA ADJ AKE EBY ZKE SPR TYL'
re.findall('[ ][A-Z]{3}[ .!?]',example)`
The regular expression misses quite a few of the matches.
Upvotes: 5
Views: 2565
Reputation: 7850
If you notice, there's a pattern to which items are missed. It's most obvious in the long section of non-punctuated symbols: it misses every other item.
This is because re.findall()
finds non-overlapping matches, and your pattern is matching both the space before and after each match. That means after one item is matched, the initial space for the next item has already been gobbled up and cannot be used again.
Use word boundaries (\b
) instead of matching leading/trailing spaces, and make your character class optional:
>>> re.findall(r'\b[A-Z]{3}\b[.!?]?',example)
['FAB.', 'APL', 'APL?', 'GJA', 'ADJ', 'AKE', 'EBY', 'ZKE', 'SPR', 'TYL']
Upvotes: 7
Reputation: 279
I would use \s[A-Z]{3}[\s\.\?]
(You are including "!" in your question but not in your regex)
Upvotes: -1