Drop redundant rows for group in pandas

Question

I have the following DataFrame:

import pandas as pd

data = {'id':  ['A','A','A','A','A','A',
                'A','A','A','A','A','A',
                'B','B','B','B','B','B',
               'C', 'C', 'C', 'C', 'C', 'C',
               'D', 'D', 'D', 'D', 'D', 'D'],
        'city':['London', 'London','London', 'London', 'London', 'London',
                'New York', 'New York', 'New York', 'New York', 'New York', 'New York',
                'Milan', 'Milan','Milan', 'Milan','Milan', 'Milan',
               'Paris', 'Paris', 'Paris', 'Paris', 'Paris', 'Paris',
               'Berlin', 'Berlin','Berlin', 'Berlin','Berlin', 'Berlin'],
        'year': [2000,2001, 2002, 2003, 2004, 2005,
                2000,2001, 2002, 2003, 2004, 2005,
                 2000,2001, 2002, 2003, 2004, 2005,
                2000,2001, 2002, 2003, 2004, 2005,
                2000,2001, 2002, 2003, 2004, 2005],
        't': [0,0,0,0,1,0,
              0,0,0,0,0,1,
              0,0,0,0,0,0,
              0,0,1,0,0,0,
             0,0,0,0,1,0]}

df = pd.DataFrame(data)

For each group id - city, I need to drop the rows for those higher years after t=1. For instance, id = A is in London in year=2004 (t=1). I want to drop the rows for the group A - London when year=2005. Please note that if an id is never in a city over 2000-2005, I want to keep all the rows (see, for instance, id = B in Milan).

The desired output:

import pandas as pd

data = {'id':  ['A','A','A','A','A',
                'A','A','A','A','A','A',
                'B','B','B','B','B','B',
               'C', 'C', 'C',
               'D', 'D', 'D', 'D', 'D'],
        'city':['London', 'London','London', 'London', 'London',
                'New York', 'New York', 'New York', 'New York', 'New York', 'New York',
                'Milan', 'Milan','Milan', 'Milan','Milan', 'Milan',
               'Paris', 'Paris', 'Paris',
               'Berlin', 'Berlin','Berlin', 'Berlin','Berlin'],
        'year': [2000,2001, 2002, 2003, 2004,
                2000,2001, 2002, 2003, 2004, 2005,
                 2000,2001, 2002, 2003, 2004, 2005,
                2000,2001, 2002,
                2000,2001, 2002, 2003, 2004],
        't': [0,0,0,0,1,
              0,0,0,0,0,1,
              0,0,0,0,0,0,
              0,0,1,
             0,0,0,0,1]}

df = pd.DataFrame(data)

Drop redundant rows for group in pandas

Answers (1)

Related Questions