Skipping rows in a csv file using Python Pandas

Question

I have a quick question on reading csv file using Pandas. The CSV file which I have has the format:

Num  Date     Value1   Value2   Value3 
1   7/29/11    1        2        3       
2   7/22/11    NA       3        1       
3   7/15/11    5        NA       NA  
4   7/6/11    -1        0        2

I would like to read this csv file using pandas. However, I need to skip the rows which have either of the column entries as NA (in this case I need to skip rows 2 and 3).

Thanks!

Andy Hayden · Accepted Answer

As @Rutger points out you can simply use dropna:

In [11]: df.dropna(subset=['Value1', 'Value2', 'Value3'])
Out[11]:
   Num     Date  Value1  Value2  Value3
0    1  7/29/11       1       2       3
3    4   7/6/11      -1       0       2

.

The initial way I suggested (which is clearly not optimal), once you've read it in as a DataFrame you can remove these rows using notnull (you want to keep only those rows which are all notnull):

In [21]: df.loc[:, ['Value1', 'Value2', 'Value3']].apply(pd.notnull)
Out[21]:
  Value1 Value2 Value3
0   True   True   True
1  False   True   True
2   True  False  False
3   True   True   True

In [22]: df.loc[:, ['Value1', 'Value2', 'Value3']].apply(pd.notnull).all(1)
Out[22]:
0     True
1    False
2    False
3     True
dtype: bool

And select only those rows:

In [23]: df[df.loc[:, ['Value1', 'Value2', 'Value3']].apply(pd.notnull).all(1)]
Out[23]:
   Num     Date  Value1  Value2  Value3
0    1  7/29/11       1       2       3
3    4   7/6/11      -1       0       2

In [24]: df = df[df.loc[:, ['Value1', 'Value2', 'Value3']].apply(pd.notnull).all(1)]

Skipping rows in a csv file using Python Pandas

Answers (1)

Related Questions