Calculate dataframe mean by skipping certain values in Python / Pandas

Question

I need to calculate the mean of the first column of the dataframe and I can do that using the mean() method. The problem: Sometimes, there are -9999 values in the data denoting missing observations. I know that NaN values are inherently skipped when calculating the mean in Pandas, but this is not the case with -9999 values of course.

Here is the code I tried. It calculates the mean of the column, but by taking the -9999 value into the calculations:

df=pandas.DataFrame([{2,4,6},{1,-9999,3}])
df[0].mean(skipna=-9999)

but it yields a mean value of -4998.5 which obviously is produced taking the -9999 into the calculations.

DSM · Accepted Answer

The skipna arg is a boolean specifying whether or not to exclude NA/null values, not which values to ignore:

skipna : boolean, default True
    Exclude NA/null values. If an entire row/column is NA, the result
    will be NA

Assuming I understand what you're trying to do, you could replace -9999 by NaN:

In [41]: df[0].replace(-9999, np.nan)
Out[41]: 
0     2
1   NaN
Name: 0, dtype: float64

In [42]: df[0].replace(-9999, np.nan).mean()
Out[42]: 2.0

Calculate dataframe mean by skipping certain values in Python / Pandas

Answers (2)

Related Questions