Reputation: 399
I have a dataframe column with either a single integer value or a range of two integers. I want to create a final column where the range values are averaged, so that the column become purely integer.
I am trying to use pandas.str.find("-") to find the location of break point and then use np.where() to create separate columns for First Value and Second Value in Range.
import numpy as np
import pandas as pd
d = {'A' : ['1234', '12 - 16'], 'Avg':[1234, 14]}
df= pd.DataFrame(d)
df['bp'] = df['A'].str.find("-")
df['F'] = np.where(df['bp']>0, df['A'].str.slice(0, df['bp']), df['A'])
I am getting NAN where range is present in column. Expected Output is in Column "Avg".
Upvotes: 1
Views: 1378
Reputation: 25259
Use extractall
and call mean
directly on level=0
df.A.str.extractall(r'(\d+)').astype(int).mean(level=0)[0]
Out[64]:
0 1234
1 14
Name: 0, dtype: int32
Upvotes: 0
Reputation: 5451
you can do it using vectorization (with out apply function like below) using str function and explode (pandas above 0.25)
your index must be unique or you need to call df.reset_index
for this to work
import pandas as pd
d = {'A' : ['1234', '12 - 16'], 'Avg':[1234, 14]}
df= pd.DataFrame(d)
df["A"].str.split("-").explode().astype(pd.np.int).groupby(level=0).mean()
Upvotes: 0
Reputation: 59274
Using str.split
df['A'].str.split(' - ').apply(lambda s: sum(map(int,s))/len(s),1)
0 1234.0
1 14.0
Name: A, dtype: float64
Upvotes: 2