ilearn
ilearn

Reputation: 193

Handling Zeros or NaNs in a Pandas DataFrame operations

I have a DataFrame (df) like shown below where each column is sorted from largest to smallest for frequency analysis. That leaves some values either zeros or NaN values as each column has a different length.

   08FB006 08FC001 08FC003 08FC005 08GD004
----------------------------------------------
0   253      872    256      11.80    2660
1   250      850    255      10.60    2510
2   246      850    241      10.30    2130
3   241      827    235      9.32     1970
4   241      821    229      9.17     1900
5   232       0     228      8.93     1840
6   231       0     225      8.05     1710
7   0         0     225       0       1610
8   0         0     224       0       1590
9   0         0      0        0       1590
10  0         0      0        0       1550

I need to perform the following calculation as if each column has different lengths or number of records (ignoring zero values). I have tried using NaN but for some reason operations on Nan values are not possible.

Here is what I am trying to do with my df columns :

shape_list1=[]
location_list1=[]
scale_list1=[]

for column in df.columns:
    shape1, location1, scale1=stats.genpareto.fit(df[column])

    shape_list1.append(shape1)
    location_list1.append(location1)
    scale_list1.append(scale1)

Upvotes: 0

Views: 198

Answers (2)

Peter Leimbigler
Peter Leimbigler

Reputation: 11105

The syntax is messy, but change

shape1, location1, scale1=stats.genpareto.fit(df[column])

to

shape1, location1, scale1=stats.genpareto.fit(df[column][df[column].nonzero()[0]])

Explanation: df[column].nonzero() returns a tuple of size (1,) whose only element, element [0], is a numpy array that holds the index labels where df is nonzero. To index df[column] by these nonzero labels, you can use df[column][df[column].nonzero()[0]].

Upvotes: 0

andersource
andersource

Reputation: 829

Assuming all values are positive (as seems from your example and description), try:

stats.genpareto.fit(df[df[column] > 0][column])

This filters every column to operate just on the positive values. Or, if negative values are allowed,

stats.genpareto.fit(df[df[column] != 0][column])

Upvotes: 1

Related Questions