Creating a new dataframe with only 1 row per value

Question

I am trying to fill a pandas dataframe (Dataframe 2) with rows from an original dataframe (Dataframe 1). I've created a mock Dataframe 1 below:

Ref Number  Name
1           Alpha
2           Alpha
3           Alpha
4           Alpha
5           Beta
6           Beta
7           Beta
8           Charlie

I want to delete rows where the value Name has occurred in previous rows. I.e. Dataframe 2 should look like

Ref Number  Name
1           Alpha
5           Beta
8           Charlie

The Ref Number doesn't matter in this instance. In my working files, I'm planning on adding a column to specify something, and then to refer to that when applying some function.

How would I go about this with Pandas? I've got a CSV with ~5000 rows and I want to limit that to a 2nd dataframe with ~1000.

jezrael · Accepted Answer

Use drop_duplicates with specifying column Name for find duplicates:

df = df.drop_duplicates('Name')
print (df)
   Ref Number     Name
0           1    Alpha
4           5     Beta
7           8  Charlie

Creating a new dataframe with only 1 row per value

Answers (1)

Related Questions