Reputation: 33
I have a database with person names and the date of their visit. I need to remove duplicated rows in "Visit_date" column with respect to each person in another column. I have a very big database, so I need a working code. I've spent several days trying to do this, but no result. Here is a sample:
Person Visit_date
0 John 11.09.2020
1 John 11.09.2020
2 John 11.08.2020
3 Andy 11.07.2020
4 Andy 11.09.2020
5 Andy 11.09.2020
6 George 11.09.2020
7 George 11.09.2020
8 George 11.07.2020
9 George 11.07.2020
The code should return:
Person Visit_date
0 John 11.09.2020
1 John 11.08.2020
2 Andy 11.07.2020
3 Andy 11.09.2020
4 George 11.09.2020
5 George 11.07.2020
Upvotes: 0
Views: 109
Reputation: 2082
Hope this help you. Using df.drop_duplicates()
then df.reset_index(drop=True)
import pandas as pd
df = pd.DataFrame({"Person" :['John','John','John','Andy','Andy','Andy','George','George','George','George'],"Visit_date" :['11.09.2020','11.09.2020','11.08.2020','11.07.2020','11.09.2020','11.09.2020','11.09.2020','11.09.2020','11.07.2020','11.07.2020']})
df=df.drop_duplicates()
df=df.reset_index(drop=True)
print(df)
[Result]:
Person Visit_date 0 John 11.09.2020 1 John 11.08.2020 2 Andy 11.07.2020 3 Andy 11.09.2020 4 George 11.09.2020 5 George 11.07.2020
Upvotes: 1