Lostferret
Lostferret

Reputation: 39

incorrect mean from PANDAS dataframe

So here's an interesting thing:

Using python 2.7:

I've got a dataframe of about 5,100 entries, each with a number (melting point) in a column titled 'Tm'. Using the code:

self.sort_df[['Tm']].mean(axis=0)

I get a mean of:

Tm    92.969204
dtype: float64

This doesn't make sense because no entry has a Tm of greater than 83.

Does .mean() not work for this many values? I've tried pairing down the dataset and it seems to work for ~1,000 entries but considering I have full dataset of 150,000 to run at once, I'd like to know if I need to find a different way to calculate the mean.

Upvotes: 0

Views: 3354

Answers (1)

fixxxer
fixxxer

Reputation: 16154

A more readable syntax would be :

sort_df['Tm'].mean()

Try to do a sort_df['Tm'].value_counts() or sort_df['Tm'].max() to see what values are present. Some unexpected values must have crept up.

The .mean function gives accurate result irrespective of the size.

Upvotes: 1

Related Questions