Reputation: 436
I am exploring pandas library, and I'd find this dataset. My task is to fill ?
with mean of by group of column 'num-of-doors'
. When I used dataframe.groupby('num-of-doors').mean()
pandas was unable to find mean of these columns:
'peak-rpm', 'price', 'bore', 'stroke', 'normalized-losses', 'horsepower'
So, I tried with my own dataset to know why it is not working. I created a file with the following contents
c0,c1,type
1,2,0
2,3,0
2,4,0
1,?,1
1,3,1
and I wrote the following script:
data = pd.read_csv("data.csv")
data = data.replace('?',np.nan)
print(data)
print(data.groupby('type').mean())
this is what I'm getting as output:
c0 c1 type
0 1 2 0
1 2 3 0
2 2 4 0
3 1 NaN 1
4 1 3 1
c0
type
0 1.666667
1 1.000000
Can you please explain what is going on here? Why I'm not getting mean for column c1
? Even I tried some Stackoverflow's answers, but still got nothing. Any suggestions?
Really appreciate your help.
Upvotes: 2
Views: 370
Reputation: 61910
The problem is that c1
, is not of type numeric, do:
data = data.replace('?',np.nan)
data['c1'] = data['c1'].astype(float)
print(data.groupby('type').mean())
Output
c0 c1
type
0 1.666667 3.0
1 1.000000 3.0
When you read the original data DataFrame, as it has a ?
the column is of dtype object (using dtypes
to verify):
c0 int64
c1 object
type int64
dtype: object
If you want to replace the nan, with the mean of the group use transform + fillna:
data = data.replace('?',np.nan)
data['c1'] = data['c1'].astype(float)
res = data.groupby('type').transform('mean')
print(data.fillna(res))
Output
c0 c1 type
0 1 2.0 0
1 2 3.0 0
2 2 4.0 0
3 1 3.0 1
4 1 3.0 1
As a last advise you could read the csv as:
data = pd.read_csv("data.csv", na_values='?')
print(data)
Output
c0 c1 type
0 1 2.0 0
1 2 3.0 0
2 2 4.0 0
3 1 NaN 1
4 1 3.0 1
This will save you the need of converting the columns to numeric.
Upvotes: 3
Reputation: 26676
df['c1']=df['c1'].str.replace('[?]','NaN').astype(float)
df.groupby('type').apply(lambda x: x.fillna(x.mean()))
Upvotes: 2