Reputation: 165
I need to create new column in pandas dataframe.
df
Id Value
1 >45%
2 >29%
3 <30 to >69
4 >40% to <56%
5 --
df_output:
Id Value Value2
1 >45% 45%
2 >29% 29%
3 <30 to >69 69
4 >40% to <56% 56%
5 -- NaN
In my Dataframe there is column called Value, i need to create new column"Value2" on top that column. In Value2 column i need only one number with suffix (if exist in value column) based on the value column.if there are multiple numbers like Id 3&4, i need the highest one among them.
I tried:
df.Value.str.extract('(\d+)')
But it only extracts the numbers.
Upvotes: 2
Views: 147
Reputation: 627082
You can use
import pandas as pd
import numpy as np
import re
df = pd.DataFrame({'Value':['>45%', '<30 to >69', '<80% and >16%', '...']})
def lookup(x):
m = list(re.finditer(r'(\d+(?:\.\d+)?)\s*%?', x))
if m:
return max(m, key=lambda x: float(x.group(1)))[0]
else:
return np.nan
df['Value2'] = df['Value'].apply(lookup)
# >>> df
# Value Value2
# 0 >45% 45%
# 1 <30 to >69 69
# 2 <80% and >16% 80%
# 3 ... NaN
Details:
re.finditer
, get all matches of the (\d+(?:\.\d+)?)\s*%?
pattern that extracts one or more digits followed with an optional sequence of a .
and one or more digits (capturing this value into Group 1) and then an optional %
signlookup(x)
method return np.NaN
if no matches are found, else, it returns the match value that contains the biggest number in Group 1.Regex details:
(\d+(?:\.\d+)?)
- Group 1: one or more digits followed with an optional occurrence of a .
and one or more digits (int or float number)\s*
- zero or more whitespaces%?
- an optional %
signSee the regex demo.
Upvotes: 2
Reputation: 8768
This should work too:
df['Value2'] = df['Value'].str.split(r'[<>]').str[-1]
Upvotes: 1
Reputation: 133610
You could use following code in Pandas. Written and tested with shown samples. Using df.str.extract
function of Pandas.
df["Value2"] = df['Value'].str.extract('[<>](\d+[%]?$)', expand=False)
Output of DataFrame's Value2 column will be as follows:
df["Value2"]
0 45%
1 29%
2 69
3 56%
Explanation of regex: matching either <
OR >
followed by 1 or more digits followed by %(optional) at end of value; in a capturing group.
Upvotes: 2
Reputation: 893
df['Value2'] = df['Value'].str.extract(r'(\d+%?)$', expand=False)
This pattern will match a sequence of digits at the end of a string, with or without a percent sign at the very end.
Upvotes: 0