bimarian
bimarian

Reputation: 140

Keep First and Last entry of a Duplicate in a Dataframe Column

I have a big dataframe with many duplicates in it. I want to keep the first and last entry of each duplicate but drop every duplicate in between.

I've already tried to get this done by using df.drop_duplicates with the parameters 'first' and 'last' to get two dataframes and then merge them again to one df so I have the first and last entry, but that didn't work.

df_first = df
df_last = df

df_first['Path'].drop_duplicates(keep='first', inplace=True)
df_last['Path'].drop_duplicates(keep='last', inplace=True)

Thanks for your help in advance!

Upvotes: 3

Views: 1692

Answers (2)

Sreeram Gunasekaran
Sreeram Gunasekaran

Reputation: 1

**Using group by.nth which is an Updated code from previous solution to get nth entry

def keep_second_dup(duplicate):
        duplicate[Columnname]=duplicate[Columnname'].value_counts()
        second_duplicate=duplicate[duplicate['Count']>=1]
        residual=duplicate[duplicate['Count']==1]
        sec=second_duplicated.groupby([Columnname]).nth([1]).reset_index()
        final_data=pd.concat([sec,residual])
        final_data.drop('Count',axis=1,inplace=True)
        return final_data

Upvotes: 0

jezrael
jezrael

Reputation: 863216

Use GroupBy.nth for avoid duplicates if group with length is 1:

df = pd.DataFrame({
         'a':[5,3,6,9,2,4],
         'Path':list('aaabbc')
})
print(df)
   a Path
0  5    a
1  3    a
2  6    a
3  9    b
4  2    b
5  4    c

df = df.groupby('Path').nth([0, -1])
print (df)
      a
Path   
a     5
a     6
b     9
b     2
c     4

Upvotes: 4

Related Questions