Reputation: 8072
I need to calculate the mean of the first column of the dataframe and I can do that using the mean()
method.
The problem: Sometimes, there are -9999 values in the data denoting missing observations.
I know that NaN values are inherently skipped when calculating the mean in Pandas, but this is not the case with -9999 values of course.
Here is the code I tried. It calculates the mean of the column, but by taking the -9999 value into the calculations:
df=pandas.DataFrame([{2,4,6},{1,-9999,3}])
df[0].mean(skipna=-9999)
but it yields a mean value of -4998.5 which obviously is produced taking the -9999 into the calculations.
Upvotes: 4
Views: 8884
Reputation: 6854
skipna
is a meant to be true or false, not a value to be skipped.
when reading your data, normalize, and replace -9999 with n/a.
Upvotes: 2
Reputation: 353179
The skipna
arg is a boolean specifying whether or not to exclude NA/null values, not which values to ignore:
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result
will be NA
Assuming I understand what you're trying to do, you could replace -9999
by NaN
:
In [41]: df[0].replace(-9999, np.nan)
Out[41]:
0 2
1 NaN
Name: 0, dtype: float64
In [42]: df[0].replace(-9999, np.nan).mean()
Out[42]: 2.0
Upvotes: 5