Reputation: 29
I have the following dataframe, df
Index time block cell
0 9 25 c1
1 9 25 c1
2 33 35 c2
3 47 4 c1
4 47 17 c2
5 100 21 c1
6 120 21 c1
7 120 36 c2
The duplicates are to be dropped based on time column. However, there is a condition: - if two or more similar times have the same cells, for example, index 0 and index 1 have c1 then keep any of the columns. - if two or more similar times have different cells eg index 3 and 4 and index 6 and 7 then keep all the rows corresponding to duplicate times
The resulting data frame will be as follows: df_result =
Index time block cell
0 9 25 c1
2 33 35 c2
3 47 4 c1
4 47 17 c2
5 100 21 c1
6 120 21 c1
7 120 36 c2
Tried
df.drop_duplicates('time')
Upvotes: 0
Views: 61
Reputation: 6476
You can group by one of the desired columns, then drop the duplicates on the other column as follows:
df = pd.DataFrame({'time':[9,9,33,47,47,100,120,120],'block':[25,25,35,4,17,21,21,36],'cell': ['c1','c1','c2','c1','c2','c1','c1','c2']})
grouped = df.groupby('time')
final_df = pd.DataFrame({'time':[] ,'block':[],'cell':[]})
for ind, gr in grouped:
final_df = final_df.append(gr.drop_duplicates("cell"))
Upvotes: 0
Reputation: 384
You can achieve this by binning the original DataFrame into categories and then running drop_duplicates()
within each category.
import pandas as pd
df = pd.DataFrame({'time':[9,9,33,47,47,100,120,120],'block':[25,25,35,4,17,21,21,36],'cell':'c1;c1;c2;c1;c2;c1;c1;c2'.split(';')})
categories = df['cell'].astype('category').unique()
df2 = pd.DataFrame()
for category in categories:
df2 = pd.concat([df2, df[df['cell'] == category].drop_duplicates(keep='first')])
df2 = df2.sort_index()
This will result in df2
being
time block cell
0 9 25 c1
2 33 35 c2
3 47 4 c1
4 47 17 c2
5 100 21 c1
6 120 21 c1
7 120 36 c2
Upvotes: 1