Reputation: 85
Dataframe looks like this
APMC Commodity Year Month Price
1 A 2015 Jan 1232
1 A 2015 Jan 1654
2 A 2015 Jan 9897
2 A 2015 Feb 3467
2 B 2016 Jan 7878
2 B 2016 Feb 8545
2 B 2016 Feb 3948
I want to remove the second and last row as the value of columns APMC, Year, Commodity and month is the same. How do I do this? The original data set is huge and I want to make changes in it(think of something like inplace=True).
Upvotes: 1
Views: 329
Reputation: 40878
You can specify columns on which to detect duplicates:
df.drop_duplicates(subset=['APMC', 'Year', 'Commodity', 'Month'],
inplace=True)
Result:
>>> df
APMC Commodity Year Month Price
0 1 A 2015 Jan 1232
2 2 A 2015 Jan 9897
3 2 A 2015 Feb 3467
4 2 B 2016 Jan 7878
5 2 B 2016 Feb 8545
Rows removed:
Column indices dropped:
>>> pd.RangeIndex(0, 7).difference(df.index)
Int64Index([1, 6], dtype='int64')
Upvotes: 1