emily_relax
emily_relax

Reputation: 89

Extract number from column to make a new column in Pandas

I have a column named Rate like below. All I want to do is to create a new column Rate_New that only extract the number part from Rate, which is 0.25. How to do that? Thanks!!

import pandas as pd 
df = pd.DataFrame({'Rate':['$0.25/Wh', '$0.25/Wh', '$0.25/Wh', '$0.25/Wh']})
df

Upvotes: 2

Views: 3395

Answers (6)

Sun
Sun

Reputation: 64

Here is my solution, you can copy and paste to use it:

df['Rate_New'] = df.Rate.apply(lambda x: float(x.replace("$","").replace("/Wh","")))

Or this, no apply, no attribute:

df["Rate"].str.replace("$","").str.replace("/Wh","")

Here is the version using regex, no attribute-style no apply.

repl = lambda m: m.group(1)
df["Rate"].str.replace(r'\$(.+?)\/Wh', repl, regex=True)

Upvotes: 3

MarianD
MarianD

Reputation: 14131

df["Rate_New"] = df.Rate.str.split(r"[$/]").apply(lambda x: x[1]).astype(float)
  • The .str.split() method converts original values into lists:

    0    [, 0.25, Wh]
    1    [, 0.25, Wh]
    2    [, 0.25, Wh]
    3    [, 0.25, Wh]
    Name: Rate, dtype: object           
    
  • The lambda function extracts middle elements (i.e. with index 1) from these lists:

    0    0.25
    1    0.25
    2    0.25
    3    0.25
    Name: Rate, dtype: object
    
  • The .astype() method then changes the data type to float:

    0    0.25
    1    0.25
    2    0.25
    3    0.25
    Name: Rate, dtype: float64
    
  • df["Rate_New"] = then assigns such created series as a new column to your dataframe:

           Rate  Rate_New
    0  $0.25/Wh      0.25
    1  $0.25/Wh      0.25
    2  $0.25/Wh      0.25
    3  $0.25/Wh      0.25
    

Upvotes: 2

MarianD
MarianD

Reputation: 14131

df["Rate_New"] = df.Rate.str.replace(r"\$(.+)/Wh", lambda m: m.group(1)).astype(float)
  • The regular expression has 1 capturing group — all symbols between $ and /Wh.
  • The lambda function replaces the original string with this capturing group.
  • The .astype() method then changes the data type (from "object") to float.

Upvotes: 0

isabella
isabella

Reputation: 467

You can replace them as following:

df["Rate_new"] = df["Rate"].apply(lambda x: x.replace("$", "").replace("/Wh", ""))

Upvotes: 0

Wayne Lambert
Wayne Lambert

Reputation: 606

This can be achieved with the following code which you can place in separate Jupyter cells:

    # Import the Pandas and regex libraries 
    import pandas as pd
    import re

    # Use your dataframe
    df = pd.DataFrame({'Rate':['$0.25/Wh', '$0.25/Wh', '$0.25/Wh', '$0.25/Wh']})

    # State regex pattern that extracts the value
    pattern = r"(?:[]{1}[,\d]+.?\d*)"

    # Iterate over rate column of dataframe and perform regex to extract value
    search = []    
    for values in df['Rate']:
        search.append(re.search(pattern, values).group())

    df['Rate_New'] = search

   # Display the result of the dataframe with appended column
    df

Upvotes: -1

Pragya
Pragya

Reputation: 111

First add a method that extracts number from Rate, say extract_rate()

def extract_rate(rate):
    return rate.replace('$', '').replace('/Wh', '')

Then, you can apply the method to the entire column to generate a new column.

df['Rate_new'] = df.apply(lambda row: extract_rate(row['Rate']), axis = 1)t_rate(row['Rate']), axis = 1)

Upvotes: 0

Related Questions