Reputation: 533
I have to make a group by very simple but it does not work in my case. I can not reproduce the actual data but suppose that my DF is:
Cod Cost Date VAL
0 A123 123 2017-12-21 0.0
1 A123 123 2017-12-21 -2.0
2 A123 123 2017-12-21 -10.0
3 FB00 180 2016-12-11 80.0
4 FB00 180 2016-12-11 80.0
I have to make a groupby for Cod, Cost and Date and get the corresponding minimum of VAL with reset_index(). My code is:
DF = DF.groupby(['Cod', 'Cost','Date'])['VAL'].min().reset_index()
the expected is:
Cod Cost Date VAL
0 A123 123 2017-12-21 -10.0
1 FB00 180 2016-12-11 80.0
but it return:
Cod Cost Date VAL
0 A123 123 2017-12-21 0.0
1 A123 123 2017-12-21 0.0
2 A123 123 2017-12-21 80.0
3 FB00 180 2016-12-11 80.0
4 FB00 180 2016-12-11 80.0
Upvotes: 0
Views: 994
Reputation: 8826
I just simulated your sample and its works for me:
>>> df
Cod Cost Date VAL
0 A123 123 2017-12-21 0.0
1 A123 123 2017-12-21 -2.0
2 A123 123 2017-12-21 -10.0
3 FB00 180 2016-12-11 80.0
4 FB00 180 2016-12-11 80.0
>>> df.groupby(['Cod', 'Cost','Date'])['VAL'].min().reset_index() Cod Cost Date VAL
0 A123 123 2017-12-21 -10.0
1 FB00 180 2016-12-11 80.0
Applied on pandas: 0.23.3
You can also apply like below:
>>> df.groupby(['Cod', 'Cost','Date']).min().reset_index()
Cod Cost Date VAL
0 A123 123 2017-12-21 -10.0
1 FB00 180 2016-12-11 80.0
It should also work with 0.18.1
:
Python 3.5.1 (default, May 25 2016, 07:19:16)
[GCC 4.8.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.__version__
'0.18.1'
Sample
>>> df
Cod Cost Date VAL
0 A123 123 2017-12-21 0.0
1 A123 123 2017-12-21 -2.0
2 A123 123 2017-12-21 -10.0
3 FB00 180 2016-12-11 80.0
4 FB00 180 2016-12-11 80.0
>>> df.groupby(['Cod', 'Cost','Date']).min().reset_index()
Cod Cost Date VAL
0 A123 123 2017-12-21 -10.0
1 FB00 180 2016-12-11 80.0
OR
>>> df.groupby(['Cod', 'Cost','Date'])['VAL'].min().reset_index()
Cod Cost Date VAL
0 A123 123 2017-12-21 -10.0
1 FB00 180 2016-12-11 80.0
Upvotes: 0
Reputation: 686
Like other contributors on here, you're code yields the expected results for me, but since this isn't the case for you allow me to propose another, though far less efficient, solution.
Starting by replicating your dataframe
df = pd.DataFrame([['A123', 123, '2017-12-21', 0], ['A123', 123, '2017-12-21', -2], ['A123', 123, '2017-12-21', -10], ['FB00', 180, '2016-12-11', 80], ['FB00', 180, '2016-12-11', 80]], columns=['Cod', 'Cost', 'Date', 'VAL'])
We can then sort the values by Cod, Cost, Date and VAL
df.sort_values(['Cod', 'Cost', 'Date', 'VAL'], ascending=True, inplace=True)
Output:
Cod Cost Date VAL
2 A123 123 2017-12-21 -10
1 A123 123 2017-12-21 -2
0 A123 123 2017-12-21 0
3 FB00 180 2016-12-11 80
4 FB00 180 2016-12-11 80
We can then drop duplicate Cod, Cost and Dates, keeping the first record (the lowest value), leaving us with your expected output.
df.drop_duplicates(['Cod', 'Cost', 'Date'], keep='first', inplace=True)
df.reset_index(drop=True, inplace=True)
With an output of:
Cod Cost Date VAL
0 A123 123 2017-12-21 -10
1 FB00 180 2016-12-11 80
Very hacky, not recommended, but should get you the result you expect.
Upvotes: 1