Reputation: 227
I have this example dataset
CPU_Sub_Series RAM Screen_Size Resolution Price
Intel i5 8 15.6 1920x1080 699
Intel i5 8 15.6 1920x1080 569
Intel i5 8 15.6 1920x1080 789
Ryzen 5 16 16.0 2560x1600 999
Ryzen 5 32 16.0 2560x1600 1299
All I want to do is, check and then drop the duplicate data, except in the price column, and then keep the lowest value in the price column.
So, the output column is like this :
CPU_Sub_Series RAM Screen_Size Resolution Price
Intel i5 8 15.6 1920x1080 569
Ryzen 5 16 16.0 2560x1600 999
Ryzen 5 32 16.0 2560x1600 1299
Should I sort it first by price? and then what?
df.sort_values('Price')
? and then what?
Upvotes: 1
Views: 1252
Reputation: 2484
In addition to @Daniele Bianco's answer, you can also get the result like this (almost similar approach but slightly different form):
import pandas as pd
df = pd.DataFrame({
'CPU_Sub_Series': ['Intel i5', 'Intel i5', 'Intel i5', 'Ryzen 5', 'Ryzen 5'],
'RAM': [8, 8, 8, 16, 32],
'Screen_Size': [15.6, 15.6, 15.6, 16.0, 16.0],
'Resolution': ['1920x1080', '1920x1080', '1920x1080', '2560x1600', '2560x1600'],
'Price': [699, 569, 789, 999, 1299]
})
df = df.groupby(["CPU_Sub_Series", "RAM", "Screen_Size", "Resolution"])['Price'].min().reset_index()
print(df)
# CPU_Sub_Series RAM Screen_Size Resolution Price
#0 Intel i5 8 15.6 1920x1080 569
#1 Ryzen 5 16 16.0 2560x1600 999
#2 Ryzen 5 32 16.0 2560x1600 1299
Upvotes: 1
Reputation: 2701
df.groupby(["CPU_Sub_Series","RAM","Screen_Size","Resolution"], as_index=False).min()
Upvotes: 4