Reputation: 2014
I have a list of tweets. They look like this:
data = [['trading $aa $BB stock market info'],
['$aa is $116 market is doing well $cc $ABC']]
I want to extract stock tickers:
['$aa', '$BB']
['$aa', '$cc', '$ABC']]
I have tried this:
for i in data:
print re.findall(r'[$]\S*', str(i))
And, the output contains $116 as well:
['$aa', '$BB']
['$aa', '$116', '$cc', '$ABC']]
Any suggestions?
Upvotes: 2
Views: 6204
Reputation: 63453
The package reticker
does this by creating a custom regular expression as per its configuration. It uses the created pattern to extract tickers from text. Alternatively, the returned pattern can be used independently.
>>> import reticker
>>> extractor = reticker.TickerExtractor()
>>> type(extractor.pattern)
<class 're.Pattern'>
>>> reticker.TickerExtractor().extract("Comparing FNGU vs $WEBL vs SOXL- who wins? And what about $cldl vs $Skyu? BTW, will the $w+Z pair still grow? IMHO, SOXL is king! [V]isa is A-okay!")
["FNGU", "WEBL", "SOXL", "CLDL", "SKYU", "W", "Z", "V", "A"]
>>> reticker.TickerExtractor().extract("Which of BTC-USD, $ETH-USD and $ada-usd is best?\nWhat about $Brk.a and $Brk.B? Compare futures MGC=F and SIL=F.")
['BTC-USD', 'ETH-USD', 'ADA-USD', 'BRK.A', 'BRK.B', 'MGC=F', 'SIL=F']
Upvotes: 3
Reputation: 123
I'll just leave this here for people looking for a regex that matches a stock ticker
re.fullmatch('([A-Za-z]{1,5})(-[A-Za-z]{1,2})?', symbol)
Upvotes: 3
Reputation: 12409
Match the dollar sign, one letter, and then anything that's not a space:
re.findall(r'[$][A-Za-z][\S]*', str(i))
Upvotes: 5