Extract values based on a pattern in a list python

Question

I would like to extract values based on certain pattern in a list.

**Example:**
ticker=['HF (NYSE) (81%);BPO (NEW YORK)]']

**Expected Output:**
Tickercode-HF;BPO
StockCode-NYSE;NEW YORK
Relevancescore-81;0

**My code**:
Tickercode=[x for x in ticker if re.match(r'[\w\.-]+[\w\.-]+', x)]
Stockcode=[x for x in ticker if re.match(r'[\w\.-]+(%)+[\w\.-]+', x)]
Relevancescore=[x for x in ticker if re.match(r'[\w\.-]+(%)+[\w\.-]+', x)]

**My output:**
['HF (NYSE) (81%);BPO (NEW YORK)]']
[]
[]

But i am getting wrong output. Please help me to resolve the issue.

Thanks

gzc · Accepted Answer

Firs, each item of ticker contains multiple records separated by semicolon, so I recommend normalize ticker. Then iterate over strings and extract info using pattern '(\w+) $([\w ]+)$( $([\d]+)%$)?'.

import re

ticker=['HF (NYSE) (81%);BPO (NEW YORK)]']
ticker=[y for x in ticker for y in x.split(';')]

Tickercode=[]
Stockcode=[]
Relevancescore=[]

for s in ticker:
    m = re.search(r'(\w+) $([\w ]+)$( $([\d]+)%$)?', s)
    Tickercode.append(m.group(1))
    Stockcode.append(m.group(2))
    Relevancescore.append(m.group(4))

print(Tickercode)
print(Stockcode)
print(Relevancescore)

Output:

['HF', 'BPO']
['NYSE', 'NEW YORK']
['81', None]

Update:

Using re.search instead of re.match which will match pattern from start of string. Your input have a leading white space, causing it failed.

You can add this to print which string doesn't match.

    if m is None:
        print('%s cannot be matched' % s)
        continue

Extract values based on a pattern in a list python

Answers (2)

Related Questions