James Lin
James Lin

Reputation: 165

How to Create New Column in Pandas?

I need to create new column in pandas dataframe.

df 

Id    Value       
1     >45%         
2     >29%        
3     <30 to >69    
4     >40% to <56% 
5     -- 

df_output:

Id    Value         Value2
1     >45%          45%
2     >29%          29%
3     <30 to >69    69
4     >40% to <56%  56%
5     --            NaN  

In my Dataframe there is column called Value, i need to create new column"Value2" on top that column. In Value2 column i need only one number with suffix (if exist in value column) based on the value column.if there are multiple numbers like Id 3&4, i need the highest one among them.

I tried:

df.Value.str.extract('(\d+)')

But it only extracts the numbers.

Upvotes: 2

Views: 147

Answers (4)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627082

You can use

import pandas as pd
import numpy as np
import re

df = pd.DataFrame({'Value':['>45%', '<30 to >69', '<80% and >16%', '...']})
def lookup(x):
    m = list(re.finditer(r'(\d+(?:\.\d+)?)\s*%?', x))
    if m:
        return max(m, key=lambda x: float(x.group(1)))[0]
    else:
        return np.nan

df['Value2'] = df['Value'].apply(lookup)
# >>> df
#            Value Value2
# 0           >45%    45%
# 1     <30 to >69     69
# 2  <80% and >16%    80%
# 3            ...    NaN

Details:

  • With re.finditer, get all matches of the (\d+(?:\.\d+)?)\s*%? pattern that extracts one or more digits followed with an optional sequence of a . and one or more digits (capturing this value into Group 1) and then an optional % sign
  • The lookup(x) method return np.NaN if no matches are found, else, it returns the match value that contains the biggest number in Group 1.

Regex details:

  • (\d+(?:\.\d+)?) - Group 1: one or more digits followed with an optional occurrence of a . and one or more digits (int or float number)
  • \s* - zero or more whitespaces
  • %? - an optional % sign

See the regex demo.

Upvotes: 2

rhug123
rhug123

Reputation: 8768

This should work too:

df['Value2'] = df['Value'].str.split(r'[<>]').str[-1]

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133610

You could use following code in Pandas. Written and tested with shown samples. Using df.str.extract function of Pandas.

df["Value2"] = df['Value'].str.extract('[<>](\d+[%]?$)', expand=False)

Output of DataFrame's Value2 column will be as follows:

df["Value2"]
0   45%
1   29%
2   69
3   56%

Explanation of regex: matching either < OR > followed by 1 or more digits followed by %(optional) at end of value; in a capturing group.

Upvotes: 2

Steele Farnsworth
Steele Farnsworth

Reputation: 893

df['Value2'] = df['Value'].str.extract(r'(\d+%?)$', expand=False)

This pattern will match a sequence of digits at the end of a string, with or without a percent sign at the very end.

Upvotes: 0

Related Questions