Reputation: 1231
I have a quick question on reading csv file using Pandas. The CSV file which I have has the format:
Num Date Value1 Value2 Value3
1 7/29/11 1 2 3
2 7/22/11 NA 3 1
3 7/15/11 5 NA NA
4 7/6/11 -1 0 2
I would like to read this csv file using pandas. However, I need to skip the rows which have either of the column entries as NA (in this case I need to skip rows 2 and 3).
Thanks!
Upvotes: 1
Views: 1008
Reputation: 375535
As @Rutger points out you can simply use dropna
:
In [11]: df.dropna(subset=['Value1', 'Value2', 'Value3'])
Out[11]:
Num Date Value1 Value2 Value3
0 1 7/29/11 1 2 3
3 4 7/6/11 -1 0 2
.
The initial way I suggested (which is clearly not optimal), once you've read it in as a DataFrame you can remove these rows using notnull
(you want to keep only those rows which are all notnull
):
In [21]: df.loc[:, ['Value1', 'Value2', 'Value3']].apply(pd.notnull)
Out[21]:
Value1 Value2 Value3
0 True True True
1 False True True
2 True False False
3 True True True
In [22]: df.loc[:, ['Value1', 'Value2', 'Value3']].apply(pd.notnull).all(1)
Out[22]:
0 True
1 False
2 False
3 True
dtype: bool
And select only those rows:
In [23]: df[df.loc[:, ['Value1', 'Value2', 'Value3']].apply(pd.notnull).all(1)]
Out[23]:
Num Date Value1 Value2 Value3
0 1 7/29/11 1 2 3
3 4 7/6/11 -1 0 2
In [24]: df = df[df.loc[:, ['Value1', 'Value2', 'Value3']].apply(pd.notnull).all(1)]
Upvotes: 2