Reputation: 39
So here's an interesting thing:
Using python 2.7:
I've got a dataframe of about 5,100 entries, each with a number (melting point) in a column titled 'Tm'. Using the code:
self.sort_df[['Tm']].mean(axis=0)
I get a mean of:
Tm 92.969204
dtype: float64
This doesn't make sense because no entry has a Tm of greater than 83.
Does .mean() not work for this many values? I've tried pairing down the dataset and it seems to work for ~1,000 entries but considering I have full dataset of 150,000 to run at once, I'd like to know if I need to find a different way to calculate the mean.
Upvotes: 0
Views: 3354
Reputation: 16154
A more readable syntax would be :
sort_df['Tm'].mean()
Try to do a sort_df['Tm'].value_counts()
or sort_df['Tm'].max()
to see what values are present. Some unexpected values must have crept up.
The .mean
function gives accurate result irrespective of the size.
Upvotes: 1