Reputation: 4077
I have this dataframe:
data = {'My_name':["abc","nc","there",""] , 'Val1':[44.20,22,None,44],'Val2':[50,20,40,72.2]}
df1 = pd.DataFrame(data)
My_name Val1 Val2
0 abc 44.2 50.0
1 nc 22.0 20.0
2 there NaN 40.0
3 44.0 72.2
4 there 28 60
And I used the following instruction to get the mean of the Values based on My_name
:
df2 = df1.where(pd.notnull(df1), None)
dcm = df2.groupby(['My_name']).agg([np.mean])
Exception: All objects passed were None
I've tried various tests and realized the error is because of the None
whilst computing the mean.
I tried using the following instead to take care of None
values:
df3 = df2.where(pd.notnull(df2['Val1']), None)
df4 = df3.where(pd.notnull(df3['Val2']), None)
dcm2 = df4.groupby(['My_name']).agg([np.mean])
but I still get the same error. How do I ignore the NaN without having it spoil the mean?
Something like this will also do : Creating two dataframes . One without None values (in Val1 and Val2) and the other with None Values. eg:
df_sub:
My_name Val1 Val2
0 abc 44.2 50.0
1 nc 22.0 20.0
3 44.0 72.2
4 there 28 60
and df_sub2
:
My_name Val1 Val2
3 there Nan 40.0
df.dropna()
looks like a good function to do it, so I did :
df_sub = df2.dropna(subset=['Val1','Val2'])
How do i get the second dataframe?
Upvotes: 0
Views: 11662
Reputation: 139162
First, I don't think you need to replace the NaN
values with None
, as NaN
is the default indicator for missing values and will be ignored by mean
by default in pandas (mean
has a skipna
parameter that is True by default).
Furthermore, replacing it with None
will make the columns of object dtype (not numeric anymore) and not all operations will work as expected.
So just try to do the grouping operation on the original dataframe:
dcm = df1.groupby(['My_name']).agg([np.mean])
Secondly, to split your dataframe, you can do:
In [26]: df1[pd.isnull(df1[['Val1', 'Val2']]).any(axis=1)]
Out[26]:
My_name Val1 Val2
2 there NaN 40
and alternatively df1[pd.notnull(df1[['Val1', 'Val2']]).all(axis=1)]
for the other subset, but this is indeed equivalent to the shorter df1.dropna(subset=[['Val1','Val2']])
Upvotes: 3