user3314418
user3314418

Reputation: 3041

One Way to Grab both text and percentages regex python

I have the following string:

Bbc (57%); Grameen (54%); Cninsure (66%) Mn-Public-Radio-Intl

I'd like to obtain:

[BBC World Service, 57], [Grameen Bank, 54], [Cninsure Inc., 66], [Mn-Public-Radio-Intl, np.nan]

I was using this pattern .+?(?=\() But it is inadequate becaause Mn-Public-Radio-Intl doesn't have a parantheses. Would appreciate help!

Upvotes: 0

Views: 37

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1122482

Make the parenthesis optional; you'll get an empty string:

re.findall(r'(\b[\w -]+\b)(?:\s+\((\d+)%\))?', inputtext)

Demo:

>>> re.findall(r'(\b[\w -]+\b)(?:\s+\((\d+)%\))?', inputtext)
[('Bbc', '57'), ('Grameen', '54'), ('Cninsure', '66'), ('Mn-Public-Radio-Intl', '')]

To get integers or float('nan') you can post process this:

import numpy as np

[(name, int(perc) if perc else np.nan)
 for name, perc in re.findall(r'(\b[\w -]+\b)(?:\s+\((\d+)%\))?', inputtext)]

which then gives:

>>> [(name, int(perc) if perc else np.nan)
...  for name, perc in re.findall(r'(\b[\w -]+\b)(?:\s+\((\d+)%\))?', inputtext)]
[('Bbc', 57), ('Grameen', 54), ('Cninsure', 66), ('Mn-Public-Radio-Intl', nan)]

Upvotes: 1

Related Questions