Hypothetical Ninja
Hypothetical Ninja

Reputation: 4077

All Objects Passed Were None , Pandas

I have this dataframe:

data = {'My_name':["abc","nc","there",""] , 'Val1':[44.20,22,None,44],'Val2':[50,20,40,72.2]}
df1 = pd.DataFrame(data)  

  My_name     Val1  Val2
0   abc       44.2  50.0
1   nc        22.0  20.0
2   there     NaN   40.0
3             44.0  72.2  
4   there     28    60

And I used the following instruction to get the mean of the Values based on My_name:

 df2 = df1.where(pd.notnull(df1), None)  
 dcm = df2.groupby(['My_name']).agg([np.mean]) 

Exception: All objects passed were None

I've tried various tests and realized the error is because of the None whilst computing the mean. I tried using the following instead to take care of None values:

df3 = df2.where(pd.notnull(df2['Val1']), None)
df4 = df3.where(pd.notnull(df3['Val2']), None) 
dcm2 = df4.groupby(['My_name']).agg([np.mean])  

but I still get the same error. How do I ignore the NaN without having it spoil the mean?

Something like this will also do : Creating two dataframes . One without None values (in Val1 and Val2) and the other with None Values. eg:

df_sub:



      My_name     Val1  Val2
    0   abc       44.2  50.0
    1   nc        22.0  20.0 
    3             44.0  72.2  
    4   there     28    60  

and df_sub2 :

   My_name    Val1  Val2
3  there      Nan   40.0  

df.dropna() looks like a good function to do it, so I did :

df_sub = df2.dropna(subset=['Val1','Val2'])

How do i get the second dataframe?

Upvotes: 0

Views: 11662

Answers (1)

joris
joris

Reputation: 139162

First, I don't think you need to replace the NaN values with None, as NaN is the default indicator for missing values and will be ignored by mean by default in pandas (mean has a skipna parameter that is True by default).
Furthermore, replacing it with None will make the columns of object dtype (not numeric anymore) and not all operations will work as expected.

So just try to do the grouping operation on the original dataframe:

dcm = df1.groupby(['My_name']).agg([np.mean])

Secondly, to split your dataframe, you can do:

In [26]: df1[pd.isnull(df1[['Val1', 'Val2']]).any(axis=1)]
Out[26]:
  My_name  Val1  Val2
2   there   NaN    40

and alternatively df1[pd.notnull(df1[['Val1', 'Val2']]).all(axis=1)] for the other subset, but this is indeed equivalent to the shorter df1.dropna(subset=[['Val1','Val2']])

Upvotes: 3

Related Questions