incorrect mean from PANDAS dataframe

Question

So here's an interesting thing:

Using python 2.7:

I've got a dataframe of about 5,100 entries, each with a number (melting point) in a column titled 'Tm'. Using the code:

self.sort_df[['Tm']].mean(axis=0)

I get a mean of:

Tm    92.969204
dtype: float64

This doesn't make sense because no entry has a Tm of greater than 83.

Does .mean() not work for this many values? I've tried pairing down the dataset and it seems to work for ~1,000 entries but considering I have full dataset of 150,000 to run at once, I'd like to know if I need to find a different way to calculate the mean.

fixxxer · Accepted Answer

A more readable syntax would be :

sort_df['Tm'].mean()

Try to do a sort_df['Tm'].value_counts() or sort_df['Tm'].max() to see what values are present. Some unexpected values must have crept up.

The .mean function gives accurate result irrespective of the size.

incorrect mean from PANDAS dataframe

Answers (1)

Related Questions