Reputation: 391
I am trying to fill a pandas dataframe (Dataframe 2) with rows from an original dataframe (Dataframe 1). I've created a mock Dataframe 1 below:
Ref Number Name
1 Alpha
2 Alpha
3 Alpha
4 Alpha
5 Beta
6 Beta
7 Beta
8 Charlie
I want to delete rows where the value Name has occurred in previous rows. I.e. Dataframe 2 should look like
Ref Number Name
1 Alpha
5 Beta
8 Charlie
The Ref Number doesn't matter in this instance. In my working files, I'm planning on adding a column to specify something, and then to refer to that when applying some function.
How would I go about this with Pandas? I've got a CSV with ~5000 rows and I want to limit that to a 2nd dataframe with ~1000.
Upvotes: 1
Views: 82
Reputation: 862741
Use drop_duplicates
with specifying column Name
for find duplicates:
df = df.drop_duplicates('Name')
print (df)
Ref Number Name
0 1 Alpha
4 5 Beta
7 8 Charlie
Upvotes: 1