Reputation: 2500
I have a df with one of the columns that appears like:
**Share**
We are safe 25%
We are always safe 12.50% (India Aus, West)
We are ok (USA, EU)
We are not OK
What is this
Always wise 25.66%
I want to split this column such that the % values wherever applicable get split from the column into a new one. So the output would be
Share Percent LOCATION
We are safe 25%
We are always safe 12.50% India Aus, West
We are ok USA, EU
We are not OK
What is this
Always wise 25.66%
I thought the below would split it from right, but it is not working
df['Percent'] = df['Share'].str.rsplit(r' \d',1).str[0]
Upvotes: 2
Views: 57
Reputation: 627103
You can extract those values:
df[['Share','Percent']] = df['Share'].str.split(r'\s+(?=\d+(?:\.\d+)?%\s*$)',expand=True).fillna("")
Pandas test:
import pandas as pd
df = pd.DataFrame({'Share':['We are safe 25%','We are ok', 'We are always safe 12.50%']})
df[['Share','Percent']] = df['Share'].str.split(r'\s+(?=\d+(?:\.\d+)?%\s*$)',expand=True).fillna("")
>>> df
Share Percent
0 We are safe 25%
1 We are ok
2 We are always safe 12.50%
See the regex demo. Details:
\s+
- one or more whitespaces(?=\d+(?:\.\d+)?%\s*$)
- a positive lookahead matching a location that is immediately followed with:
\d+
- one or more digits(?:\.\d+)?
- an optional sequence of .
and one or more digits%
- a %
symbol\s*
- 0 or more trailing (as $
comes next) whitespaces and$
- end of string.Upvotes: 2