anon_swe
anon_swe

Reputation: 9345

Pandas: Calculating column-wise mean yields nulls

I have a pandas DataFrame, df, and I'd like to get the mean for columns 180 through the end (not including the last column), only using the first 100K rows.

If I use the whole DataFrame:

df.mean().isnull().any()

I get False

If I use only the first 100K rows:

train_means = df.iloc[:100000, 180:-1].mean()
train_means.isnull().any()

I get: True

I'm not sure how this is possible, since the second approach is only getting the column means for a subset of the full DataFrame. So if no column in the full DataFrame has a mean of NaN, I don't see how a column in a subset of the full DataFrame can.

For what it's worth, I ran:

df.columns[df.isna().all()].tolist()

and I get: []. So I don't think I have any columns where every entry is NaN (which would cause a NaN in my train_means calculation).

Any idea what I'm doing incorrectly?

Thanks!

Upvotes: 1

Views: 142

Answers (1)

BENY
BENY

Reputation: 323226

Try look at

 (df.iloc[:100000, 180:-1].isnull().sum()==100000).any()

If this return True , which mean you have a columns' value is all NaN in the first 100000 rows

And Now let us explain why you get all notnull when do the mean to the whole dataframe , since mean have skipna default as True so it will drop NaN before mean

Upvotes: 2

Related Questions