non_linear
non_linear

Reputation: 423

sorting DataFrame based on multiple conditions in pandas

I have a DataFrame similar to the following

date      mood    score   count  avg       abs
23/12/18 negative -50.893   137  -0.371     50.893
23/12/18 neutral  0.2193    10   0.0219     0.2193
23/12/18 positive 336.5098  673  0.5000     336.5098
24/12/18 positive 91.2414   232  -0.393     91.2414
24/12/18 neutral  0.063     14   0.0045     0.063
24/12/18 negative -649.697  1184 0.5487     649.697
25/12/18 negative -72.4142  8   -9.0517     72.4142
25/12/18 positive   0       0    0          0
25/12/18 neutral  323.0056  173 1.86708     323.0056
26/12/18 negative -12.0467   15 -.8031      12.0467

I want to apply the following conditions to this dataset.

Con: if the absolute value(abs) score on a date is the greatest (of 3 moods), keep that date only together with its other attributes. 
Con: No duplicate date is to be kept. So the dataset will be reduced quite a lot compared to its original size.

Expected Output

date       mood_corrected   score   count   avg          abs
23/12/18    positive      336.5098  673    0.50001456   336.5098
24/12/18    negative      649.697   1184   0.54873057   649.697
25/12/18    neutral       323.0056  173    1.86708439   323.0056
26/12/18    negative      -12.0467  15     -0.8031      12.0467

My code

import pandas as pd 

df =pd.read_csv('file.csv')

new_df= df.sort_values('abs', ascending=False).drop_duplicates(['date','mood'])

While I get result that sort the dataset based on absolute value**(abs)**, I still have the full dataset. It is not reduced. Any help is appreciated. Many thanks.

Note: I looked at stackoverflow but have not found a quite similar problem.

Upvotes: 0

Views: 858

Answers (1)

Sunghee Yun
Sunghee Yun

Reputation: 820

The following will do the job!

new_df = df.sort_values('abs', ascending=False).drop_duplicates(['date']).sort_values('date')

Upvotes: 3

Related Questions