Reputation: 1194
Hi all I'm trying to delete rows based on a few conditions and needed some help. I have a dataframe with this structure, assuming there are more columns:
date city col_a col_b col_c ...
1/22/20 la 0 0 0
1/23/20 la 0 0 0
1/24/20 la 0 0 0
1/22/20 ny 3 0 1
1/23/20 ny 0 1 1
1/24/20 ny 0 1 0
1/22/20 sf 0 0 0
1/23/20 sf 0 2 0
1/24/20 sf 2 0 0
.
.
.
I would like to scan the dataframe based on the city
and the latest date for that city, and remove that entire set of rows if the latest date for that city has col_a
, col_b
AND col_c
values of 0
. Assume city
column will be unique, and I have to scan every unique value. So resulting dataframe should be:
date city col_a col_b col_c ...
1/22/20 ny 3 0 1
1/23/20 ny 0 1 1
1/24/20 ny 0 1 0
1/22/20 sf 0 0 0
1/23/20 sf 0 2 0
1/24/20 sf 2 0 0
.
.
.
I'm assuming this is a groupby problem but not sure how to get latest date and delete entirety of the rows. Any help is appreciated.
Upvotes: 0
Views: 30
Reputation: 323226
We can do transform
+ all
df = df[~df.filter(like='col').eq(0).all(1).groupby(df.city).transform(all)]
Out[389]:
date city col_a col_b col_c
3 1/22/20 ny 3 0 1
4 1/23/20 ny 0 1 1
5 1/24/20 ny 0 1 0
6 1/22/20 sf 0 0 0
7 1/23/20 sf 0 2 0
8 1/24/20 sf 2 0 0
Upvotes: 1