Reputation: 2014

Regular expression for stock tickers - Python

I have a list of tweets. They look like this:

data = [['trading $aa $BB stock market info'],
        ['$aa is $116 market is doing well $cc $ABC']]

I want to extract stock tickers:

['$aa', '$BB']
['$aa', '$cc', '$ABC']]

I have tried this:

for i in data:
    print re.findall(r'[$]\S*', str(i))

And, the output contains $116 as well:

['$aa', '$BB']
['$aa', '$116', '$cc', '$ABC']]

Any suggestions?

Upvotes: 2

Answers (3)

Asclepius

Reputation: 63453

The package reticker does this by creating a custom regular expression as per its configuration. It uses the created pattern to extract tickers from text. Alternatively, the returned pattern can be used independently.

>>> import reticker

>>> extractor = reticker.TickerExtractor()
>>> type(extractor.pattern)
<class 're.Pattern'>

>>> reticker.TickerExtractor().extract("Comparing FNGU vs $WEBL vs SOXL- who wins? And what about $cldl vs $Skyu? BTW, will the $w+Z pair still grow? IMHO, SOXL is king! [V]isa is A-okay!")
["FNGU", "WEBL", "SOXL", "CLDL", "SKYU", "W", "Z", "V", "A"]

>>> reticker.TickerExtractor().extract("Which of BTC-USD, $ETH-USD and $ada-usd is best?\nWhat about $Brk.a and $Brk.B? Compare futures MGC=F and SIL=F.")
['BTC-USD', 'ETH-USD', 'ADA-USD', 'BRK.A', 'BRK.B', 'MGC=F', 'SIL=F']

Upvotes: 3

Tom Sawyer

Reputation: 123

I'll just leave this here for people looking for a regex that matches a stock ticker

re.fullmatch('([A-Za-z]{1,5})(-[A-Za-z]{1,2})?', symbol)

Upvotes: 3

Harald Nordgren

Reputation: 12409

Match the dollar sign, one letter, and then anything that's not a space:

re.findall(r'[$][A-Za-z][\S]*', str(i))

Upvotes: 5

Regular expression for stock tickers - Python

Answers (3)

Related Questions