Dheeraj
Dheeraj

Reputation: 1202

Filter dataframe columns values greater than zero?

I have a csv file which i am reading as pd.read_csv(File) and i am trying to get only those rows which have values greater than zero.

The dataframe hase some empty cells and some negative values and some exp numbers like -1.72E+10.

Time              A      B       C       D       E       F         G
9/8/2017 8:40   1.29    0.27    1.78    0.23    0.33    0.05    -13.72
9/8/2017 9:00   1.28    0.26    1.78    0.22    0.35    0.02    -13.59
9/8/2017 9:20   1.43                         
9/8/2017 9:40   1.44    0.29    1.93    0.25    0.28    0.01    -13.92
9/8/2017 10:00  1.36    0.27    1.84    0.23    0.31    0.02    -13.77
9/8/2017 10:20  1.38    0.27    1.89    0.23    0.31    0.01    -13.83
9/8/2017 10:40      -1.72E+10   -1.72E+10   -1.72E+10   -1.72E+10   -1.72E+10   -1.72E+10
9/8/2017 11:00  1.4 0.28    1.88    0.24    0.28    0.02    -13.92
9/8/2017 11:20  1.43    0.28    1.92    0.24    0.29    0.02    -13.83

Whenever i run the code it doesn't filter those data.

df = df[df > 0]

type of the column is str instead of numpy.float64

Can someone tell me the problem?

I want to filter the whole dataframe rows whose values are graeter than 0.

Upvotes: 4

Views: 25108

Answers (1)

jezrael
jezrael

Reputation: 862611

I think you need any for check at least one True:

df = df[(df > 0).any(axis=1)]

Or all for check if all Trues:

df = df[(df > 0).all(axis=1)]

#last row and first numeric column was modify for no negative values
print (df)
             Time             A             B             C             D  \
0   9/8/2017 8:40  1.290000e+00  2.700000e-01  1.780000e+00  2.300000e-01   
1   9/8/2017 9:00  1.280000e+00  2.600000e-01  1.780000e+00  2.200000e-01   
2   9/8/2017 9:20  1.430000e+00           NaN           NaN           NaN   
3   9/8/2017 9:40  1.440000e+00  2.900000e-01  1.930000e+00  2.500000e-01   
4  9/8/2017 10:00  1.360000e+00  2.700000e-01  1.840000e+00  2.300000e-01   
5  9/8/2017 10:20  1.380000e+00  2.700000e-01  1.890000e+00  2.300000e-01   
6  9/8/2017 10:40  1.720000e+10 -1.720000e+10 -1.720000e+10 -1.720000e+10   
7  9/8/2017 11:00  1.400000e+00  2.800000e-01  1.880000e+00  2.400000e-01   
8  9/8/2017 11:20  1.430000e+00  2.800000e-01  1.920000e+00  2.400000e-01   

              E             F      G  
0  3.300000e-01  5.000000e-02 -13.72  
1  3.500000e-01  2.000000e-02 -13.59  
2           NaN           NaN    NaN  
3  2.800000e-01  1.000000e-02 -13.92  
4  3.100000e-01  2.000000e-02 -13.77  
5  3.100000e-01  1.000000e-02 -13.83  
6 -1.720000e+10 -1.720000e+10    NaN  
7  2.800000e-01  2.000000e-02 -13.92  
8  2.900000e-01  2.000000e-02  13.83  


df1 = df[(df > 0).all(axis=1)]
print (df1)
             Time     A     B     C     D     E     F      G
8  9/8/2017 11:20  1.43  0.28  1.92  0.24  0.29  0.02  13.83

df1 = df.loc[:, (df > 0).all()]
print (df1)
             Time             A
0   9/8/2017 8:40  1.290000e+00
1   9/8/2017 9:00  1.280000e+00
2   9/8/2017 9:20  1.430000e+00
3   9/8/2017 9:40  1.440000e+00
4  9/8/2017 10:00  1.360000e+00
5  9/8/2017 10:20  1.380000e+00
6  9/8/2017 10:40  1.720000e+10
7  9/8/2017 11:00  1.400000e+00
8  9/8/2017 11:20  1.430000e+00

EDIT1:

For convert to floats all columns without Time:

cols = df.columns.difference(['Time'])
df[cols] = df[cols].astype(float)
print (df.dtypes)
Time     object
A       float64
B       float64
C       float64
D       float64
E       float64
F       float64
G       float64
dtype: object

df1 = df.loc[:, (df > 0).all()]
print (df1)
             Time             A
0   9/8/2017 8:40  1.290000e+00
1   9/8/2017 9:00  1.280000e+00
2   9/8/2017 9:20  1.430000e+00
3   9/8/2017 9:40  1.440000e+00
4  9/8/2017 10:00  1.360000e+00
5  9/8/2017 10:20  1.380000e+00
6  9/8/2017 10:40  1.720000e+10
7  9/8/2017 11:00  1.400000e+00
8  9/8/2017 11:20  1.430000e+00

Upvotes: 9

Related Questions