chungkim271
chungkim271

Reputation: 967

python pandas-extracting a portion of a string based on the pattern around it

I have a string column that follows a following pattern:

yariyada up to a maximum of (number)% yariyada

For example, like this.

will be granted up to a maximum of 75.5% If less, then nothing

I want to create another column that extracts that number that comes between "up to a maximum of" and "%".

So far I'm only able to detect if the string column contains that pattern, using .contains method.

If this is of any elucidation, in Stata (I'm a stata user), I would use regexm to break the string into parts and use regexs to retreive the parts. I'm wondering if Pandas has a similar, or better!, function.

Thanks for your help!

Upvotes: 0

Views: 2637

Answers (2)

Anu
Anu

Reputation: 197

bigtable

color region finish
red, yellow AK, NV, CA a, b,c
red, blue CA,TX, NV a,c, p
blue, red TX,CA, AK p,a, c
blue, yellow TX,CA, NV p, c, a
yellow, red AK,CA,NV c, b, a
yellow,blue CA,TX, NV c, a, b

    list = list(bigtable)
    for index in range(len(list)):
       bigtable1[list[index]] = bigtable1[list[index]].str.split(',', expand=True).apply(lambda x: pd.Series(np.sort(x)).str.cat(sep=','), axis=1)

Upvotes: 0

Zero
Zero

Reputation: 76917

You could use pandas.core.strings.StringMethods.extract method to ind groups in each string using passed regular expression

df['col_name'].str.extract('up to a maximum of (.*)%')

Will give you a new column with number extracted

Upvotes: 2

Related Questions