multigoodverse
multigoodverse

Reputation: 8072

Calculate dataframe mean by skipping certain values in Python / Pandas

I need to calculate the mean of the first column of the dataframe and I can do that using the mean() method. The problem: Sometimes, there are -9999 values in the data denoting missing observations. I know that NaN values are inherently skipped when calculating the mean in Pandas, but this is not the case with -9999 values of course.

Here is the code I tried. It calculates the mean of the column, but by taking the -9999 value into the calculations:

df=pandas.DataFrame([{2,4,6},{1,-9999,3}])
df[0].mean(skipna=-9999)

but it yields a mean value of -4998.5 which obviously is produced taking the -9999 into the calculations.

Upvotes: 4

Views: 8884

Answers (2)

mnagel
mnagel

Reputation: 6854

skipna is a meant to be true or false, not a value to be skipped.

when reading your data, normalize, and replace -9999 with n/a.

Upvotes: 2

DSM
DSM

Reputation: 353179

The skipna arg is a boolean specifying whether or not to exclude NA/null values, not which values to ignore:

skipna : boolean, default True
    Exclude NA/null values. If an entire row/column is NA, the result
    will be NA

Assuming I understand what you're trying to do, you could replace -9999 by NaN:

In [41]: df[0].replace(-9999, np.nan)
Out[41]: 
0     2
1   NaN
Name: 0, dtype: float64

In [42]: df[0].replace(-9999, np.nan).mean()
Out[42]: 2.0

Upvotes: 5

Related Questions