chris302107
chris302107

Reputation: 51

Regular Expression Matching Stock Ticker

I'm having trouble matching stock tickers in a string of text. I want a regular expression to match a space , 3 uppercase letters, and finally a space, period, OR question mark.

Below is the sample pattern that I created.

> `example = 'These are the tickers that I am trying to find: FAB. APL APL? GJA ADJ AKE EBY ZKE SPR TYL'

re.findall('[ ][A-Z]{3}[ .!?]',example)`

The regular expression misses quite a few of the matches.

Upvotes: 5

Views: 2565

Answers (2)

glibdud
glibdud

Reputation: 7850

If you notice, there's a pattern to which items are missed. It's most obvious in the long section of non-punctuated symbols: it misses every other item.

This is because re.findall() finds non-overlapping matches, and your pattern is matching both the space before and after each match. That means after one item is matched, the initial space for the next item has already been gobbled up and cannot be used again.

Use word boundaries (\b) instead of matching leading/trailing spaces, and make your character class optional:

>>> re.findall(r'\b[A-Z]{3}\b[.!?]?',example)
['FAB.', 'APL', 'APL?', 'GJA', 'ADJ', 'AKE', 'EBY', 'ZKE', 'SPR', 'TYL']

Upvotes: 7

GermanC
GermanC

Reputation: 279

I would use \s[A-Z]{3}[\s\.\?] (You are including "!" in your question but not in your regex)

Upvotes: -1

Related Questions