Reputation: 89
I would like to extract values based on certain pattern in a list.
**Example:**
ticker=['HF (NYSE) (81%);BPO (NEW YORK)]']
**Expected Output:**
Tickercode-HF;BPO
StockCode-NYSE;NEW YORK
Relevancescore-81;0
**My code**:
Tickercode=[x for x in ticker if re.match(r'[\w\.-]+[\w\.-]+', x)]
Stockcode=[x for x in ticker if re.match(r'[\w\.-]+(%)+[\w\.-]+', x)]
Relevancescore=[x for x in ticker if re.match(r'[\w\.-]+(%)+[\w\.-]+', x)]
**My output:**
['HF (NYSE) (81%);BPO (NEW YORK)]']
[]
[]
But i am getting wrong output. Please help me to resolve the issue.
Thanks
Upvotes: 0
Views: 2847
Reputation: 8609
Firs, each item of ticker
contains multiple records separated by semicolon, so I recommend normalize ticker. Then iterate over strings and extract info using
pattern '(\w+) \(([\w ]+)\)( \(([\d]+)%\))?'
.
import re
ticker=['HF (NYSE) (81%);BPO (NEW YORK)]']
ticker=[y for x in ticker for y in x.split(';')]
Tickercode=[]
Stockcode=[]
Relevancescore=[]
for s in ticker:
m = re.search(r'(\w+) \(([\w ]+)\)( \(([\d]+)%\))?', s)
Tickercode.append(m.group(1))
Stockcode.append(m.group(2))
Relevancescore.append(m.group(4))
print(Tickercode)
print(Stockcode)
print(Relevancescore)
Output:
['HF', 'BPO']
['NYSE', 'NEW YORK']
['81', None]
Update:
Using re.search
instead of re.match
which will match pattern from start of string. Your input have a leading white space, causing it failed.
You can add this to print which string doesn't match.
if m is None:
print('%s cannot be matched' % s)
continue
Upvotes: 3
Reputation: 4070
The problem with your code is that you're building up each of your lists from the input. You're telling it, "make a list of the input if the input matches my regular expression". The re.match()
only matches against the beginning of a string, so the only regex that matches is the one that matches against the ticker symbol itself.
I've reorganized your code a bit below to show how it can work.
Break up your input so you're only handling one group at a time
#!/usr/bin/env python
import re
# Example:
ticker=['HF (NYSE) (81%);BPO (NEW YORK)]']
# **Expected Output:**
# Tickercode-HF;BPO
# StockCode-NYSE;NEW YORK
# Relevancescore-81;0
tickercode=[]
stockcode=[]
relevancescore=[]
ticker_re = re.compile(r'^\s*([A-Z]+)')
stock_re = re.compile(r'\(([\w ]+)\)')
relevance_re = re.compile(r'\((\d+)%\)')
for tick in ticker:
for stockinfo in tick.split(";"):
ticker_match = ticker_re.search(stockinfo)
stock_match = stock_re.search(stockinfo)
relevance_match = relevance_re.search(stockinfo)
ticker_code = ticker_match.group(1) if ticker_match else ''
stock_code = stock_match.group(1) if stock_match else ''
relevance_score = relevance_match.group(1) if relevance_match else '0'
tickercode.append(ticker_code)
stockcode.append(stock_code)
relevancescore.append(relevance_score)
print 'Tickercode-' + ';'.join(tickercode)
print 'StockCode-' + ';'.join(stockcode)
print 'Relevancescore-' + ';'.join(relevancescore)
Upvotes: 0