Delete all values in a dataframe that repeat in a pandas groupby

Question

I have a dataframe with a date column and another column with strings, which are repeated for some dates. I want to drop those strings that repeat in a date from all the dataframe.

What I have looks like this:

df

date        name
01/01/2020  Lucas
01/01/2020  Marie
01/01/2020  Lucy
01/05/2020  Lucas 
01/05/2020  Marie 
01/05/2020  Lucas 
01/05/2020  Phil

In this case, df['name']='Lucas' is repeated in df['date']='01/05/2020'. Then, I am trying to remove all rows with df['name']='Lucas', keeping with this dataframe.

df2

date        name
01/01/2020  Marie
01/01/2020  Lucy
01/05/2020  Marie 
01/05/2020  Phil

How can I do this?

Thanks!!

Derek O · Accepted Answer

You can find all of the unique names that are repeated for the same date, and store them in a list. Then subset the dataframe by any rows matching these names. In your case repeated_names will only return ['Lucas'], but this is generalizable.

# create a new dataframe where the names 'Lucas' and 'Marie' are repeated for the same date
df = pd.DataFrame({'date':['01/01/2020']*3 + ['01/05/2020']*4 + ['01/06/2020']*4,'name':['Lucas','Marie','Lucy','Lucas','Marie','Lucas','Phil','Marie','Marie','Lucy','Lucas']})

Input dataframe:

df
          date   name
0   01/01/2020  Lucas
1   01/01/2020  Marie
2   01/01/2020   Lucy
3   01/05/2020  Lucas
4   01/05/2020  Marie
5   01/05/2020  Lucas
6   01/05/2020   Phil
7   01/06/2020  Marie
8   01/06/2020  Marie
9   01/06/2020   Lucy
10  01/06/2020  Lucas

repeated_names = df[df.duplicated()]['name'].values.tolist()
df2 = df[~df['name'].isin(repeated_names)]

Output dataframe (drops 'Lucas' and 'Marie'):

df2
         date  name
2  01/01/2020  Lucy
6  01/05/2020  Phil
9  01/06/2020  Lucy

Delete all values in a dataframe that repeat in a pandas groupby

Answers (1)

Related Questions