Reputation: 3041
I have the following string:
Bbc (57%); Grameen (54%); Cninsure (66%) Mn-Public-Radio-Intl
I'd like to obtain:
[BBC World Service, 57], [Grameen Bank, 54], [Cninsure Inc., 66], [Mn-Public-Radio-Intl, np.nan]
I was using this pattern .+?(?=\()
But it is inadequate becaause Mn-Public-Radio-Intl
doesn't have a parantheses.
Would appreciate help!
Upvotes: 0
Views: 37
Reputation: 1122482
Make the parenthesis optional; you'll get an empty string:
re.findall(r'(\b[\w -]+\b)(?:\s+\((\d+)%\))?', inputtext)
Demo:
>>> re.findall(r'(\b[\w -]+\b)(?:\s+\((\d+)%\))?', inputtext)
[('Bbc', '57'), ('Grameen', '54'), ('Cninsure', '66'), ('Mn-Public-Radio-Intl', '')]
To get integers or float('nan')
you can post process this:
import numpy as np
[(name, int(perc) if perc else np.nan)
for name, perc in re.findall(r'(\b[\w -]+\b)(?:\s+\((\d+)%\))?', inputtext)]
which then gives:
>>> [(name, int(perc) if perc else np.nan)
... for name, perc in re.findall(r'(\b[\w -]+\b)(?:\s+\((\d+)%\))?', inputtext)]
[('Bbc', 57), ('Grameen', 54), ('Cninsure', 66), ('Mn-Public-Radio-Intl', nan)]
Upvotes: 1