asimo
asimo

Reputation: 2500

Split pandas column on number with %

I have a df with one of the columns that appears like:

**Share**
We are safe 25%
We are always safe 12.50% (India Aus, West)
We are ok (USA, EU)
We are not OK
What is this
Always wise 25.66%

I want to split this column such that the % values wherever applicable get split from the column into a new one. So the output would be

Share                  Percent    LOCATION
We are safe            25%  
We are always safe     12.50%     India Aus, West
We are ok                         USA, EU
We are not OK
What is this
Always wise            25.66%

I thought the below would split it from right, but it is not working

df['Percent'] = df['Share'].str.rsplit(r' \d',1).str[0]

Upvotes: 2

Views: 57

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627103

You can extract those values:

df[['Share','Percent']] = df['Share'].str.split(r'\s+(?=\d+(?:\.\d+)?%\s*$)',expand=True).fillna("")

Pandas test:

import pandas as pd
df = pd.DataFrame({'Share':['We are safe 25%','We are ok', 'We are always safe 12.50%']})
df[['Share','Percent']] = df['Share'].str.split(r'\s+(?=\d+(?:\.\d+)?%\s*$)',expand=True).fillna("")
>>> df
                Share Percent
0         We are safe     25%
1           We are ok        
2  We are always safe  12.50%

See the regex demo. Details:

  • \s+ - one or more whitespaces
  • (?=\d+(?:\.\d+)?%\s*$) - a positive lookahead matching a location that is immediately followed with:
    • \d+ - one or more digits
    • (?:\.\d+)? - an optional sequence of . and one or more digits
    • % - a % symbol
    • \s* - 0 or more trailing (as $ comes next) whitespaces and
    • $ - end of string.

Upvotes: 2

Related Questions