Reputation: 967
I have a string column that follows a following pattern:
yariyada up to a maximum of (number)% yariyada
For example, like this.
will be granted up to a maximum of 75.5% If less, then nothing
I want to create another column that extracts that number that comes between "up to a maximum of" and "%".
So far I'm only able to detect if the string column contains that pattern, using .contains method.
If this is of any elucidation, in Stata (I'm a stata user), I would use regexm to break the string into parts and use regexs to retreive the parts. I'm wondering if Pandas has a similar, or better!, function.
Thanks for your help!
Upvotes: 0
Views: 2637
Reputation: 197
bigtable
color region finish
red, yellow AK, NV, CA a, b,c
red, blue CA,TX, NV a,c, p
blue, red TX,CA, AK p,a, c
blue, yellow TX,CA, NV p, c, a
yellow, red AK,CA,NV c, b, a
yellow,blue CA,TX, NV c, a, b
list = list(bigtable)
for index in range(len(list)):
bigtable1[list[index]] = bigtable1[list[index]].str.split(',', expand=True).apply(lambda x: pd.Series(np.sort(x)).str.cat(sep=','), axis=1)
Upvotes: 0
Reputation: 76917
You could use pandas.core.strings.StringMethods.extract method to ind groups in each string using passed regular expression
df['col_name'].str.extract('up to a maximum of (.*)%')
Will give you a new column with number extracted
Upvotes: 2