Reputation: 89
I have a column named Rate like below. All I want to do is to create a new column Rate_New that only extract the number part from Rate, which is 0.25. How to do that? Thanks!!
import pandas as pd
df = pd.DataFrame({'Rate':['$0.25/Wh', '$0.25/Wh', '$0.25/Wh', '$0.25/Wh']})
df
Upvotes: 2
Views: 3395
Reputation: 64
Here is my solution, you can copy and paste to use it:
df['Rate_New'] = df.Rate.apply(lambda x: float(x.replace("$","").replace("/Wh","")))
Or this, no apply, no attribute:
df["Rate"].str.replace("$","").str.replace("/Wh","")
Here is the version using regex, no attribute-style no apply.
repl = lambda m: m.group(1)
df["Rate"].str.replace(r'\$(.+?)\/Wh', repl, regex=True)
Upvotes: 3
Reputation: 14131
df["Rate_New"] = df.Rate.str.split(r"[$/]").apply(lambda x: x[1]).astype(float)
The .str.split()
method converts original values into lists:
0 [, 0.25, Wh] 1 [, 0.25, Wh] 2 [, 0.25, Wh] 3 [, 0.25, Wh] Name: Rate, dtype: object
The lambda function extracts middle elements (i.e. with index 1
) from these lists:
0 0.25 1 0.25 2 0.25 3 0.25 Name: Rate, dtype: object
The .astype()
method then changes the data type to float
:
0 0.25 1 0.25 2 0.25 3 0.25 Name: Rate, dtype: float64
df["Rate_New"] =
then assigns such created series as a new column to your dataframe:
Rate Rate_New 0 $0.25/Wh 0.25 1 $0.25/Wh 0.25 2 $0.25/Wh 0.25 3 $0.25/Wh 0.25
Upvotes: 2
Reputation: 14131
df["Rate_New"] = df.Rate.str.replace(r"\$(.+)/Wh", lambda m: m.group(1)).astype(float)
$
and /Wh
. .astype()
method then changes the data type (from "object") to float
.Upvotes: 0
Reputation: 467
You can replace them as following:
df["Rate_new"] = df["Rate"].apply(lambda x: x.replace("$", "").replace("/Wh", ""))
Upvotes: 0
Reputation: 606
This can be achieved with the following code which you can place in separate Jupyter cells:
# Import the Pandas and regex libraries
import pandas as pd
import re
# Use your dataframe
df = pd.DataFrame({'Rate':['$0.25/Wh', '$0.25/Wh', '$0.25/Wh', '$0.25/Wh']})
# State regex pattern that extracts the value
pattern = r"(?:[]{1}[,\d]+.?\d*)"
# Iterate over rate column of dataframe and perform regex to extract value
search = []
for values in df['Rate']:
search.append(re.search(pattern, values).group())
df['Rate_New'] = search
# Display the result of the dataframe with appended column
df
Upvotes: -1
Reputation: 111
First add a method that extracts number from
Rate, say extract_rate()
def extract_rate(rate):
return rate.replace('$', '').replace('/Wh', '')
Then, you can apply the method to the entire column to generate a new column.
df['Rate_new'] = df.apply(lambda row: extract_rate(row['Rate']), axis = 1)t_rate(row['Rate']), axis = 1)
Upvotes: 0