Reputation: 303
I have a dataframe with a date column and another column with strings, which are repeated for some dates. I want to drop those strings that repeat in a date from all the dataframe.
What I have looks like this:
df
date name
01/01/2020 Lucas
01/01/2020 Marie
01/01/2020 Lucy
01/05/2020 Lucas
01/05/2020 Marie
01/05/2020 Lucas
01/05/2020 Phil
In this case, df['name']='Lucas'
is repeated in df['date']='01/05/2020'
. Then, I am trying to remove all rows with df['name']='Lucas'
, keeping with this dataframe.
df2
date name
01/01/2020 Marie
01/01/2020 Lucy
01/05/2020 Marie
01/05/2020 Phil
How can I do this?
Thanks!!
Upvotes: 1
Views: 66
Reputation: 19565
You can find all of the unique names that are repeated for the same date, and store them in a list. Then subset the dataframe by any rows matching these names. In your case repeated_names
will only return ['Lucas']
, but this is generalizable.
# create a new dataframe where the names 'Lucas' and 'Marie' are repeated for the same date
df = pd.DataFrame({'date':['01/01/2020']*3 + ['01/05/2020']*4 + ['01/06/2020']*4,'name':['Lucas','Marie','Lucy','Lucas','Marie','Lucas','Phil','Marie','Marie','Lucy','Lucas']})
Input dataframe:
df
date name
0 01/01/2020 Lucas
1 01/01/2020 Marie
2 01/01/2020 Lucy
3 01/05/2020 Lucas
4 01/05/2020 Marie
5 01/05/2020 Lucas
6 01/05/2020 Phil
7 01/06/2020 Marie
8 01/06/2020 Marie
9 01/06/2020 Lucy
10 01/06/2020 Lucas
repeated_names = df[df.duplicated()]['name'].values.tolist()
df2 = df[~df['name'].isin(repeated_names)]
Output dataframe (drops 'Lucas' and 'Marie'):
df2
date name
2 01/01/2020 Lucy
6 01/05/2020 Phil
9 01/06/2020 Lucy
Upvotes: 1