Neven
Neven

Reputation: 453

Pandas GroupBy and sort_values don't work as expected

I have dataset which has columns:

Unnamed: 0         int64
id                object
number_from       object
number_to         object
time              object

And I applied this function on it: data1.groupby('number_from').apply(lambda x: x.sort_values('time'))

I get something like this:

Unnamed   id     number_from  number_to       time
17699  d20b3e          934           674  2017-07-03 06:36:20.000  
17700  d20b81          934           674  2017-07-03 06:36:22.000  
17701  d20b96          934           674  2017-07-03 06:36:23.000  
**17703  d20c17        612           235  2017-07-03 06:36:28.000**  
17707  d20db5          934           658  2017-07-03 06:36:45.000  
17708  d20de9          934           658  2017-07-03 06:36:47.000  
17710  d20e05          934           658  2017-07-03 06:36:49.000  
17711  d20e41          934           658  2017-07-03 06:36:51.000  
17712  d20e73          934           658  2017-07-03 06:36:53.000 
17713  d20ecc          934           702  2017-07-03 06:36:57.000  
17714  d20ef1          934           702  2017-07-03 06:36:59.000 
17715  d20f32          934           702  2017-07-03 06:37:01.000  
17716  d20f77          934           702  2017-07-03 06:37:03.000  
17717  d20f8d          934           702  2017-07-03 06:37:05.000 
17718  d20fd8          934           262  2017-07-03 06:37:08.000 
17719  d21017          934           262  2017-07-03 06:37:11.000  
17720  d21032          934           262  2017-07-03 06:37:12.000  
17721  d2103e          934           262  2017-07-03 06:37:13.000  
17722  d2106d          934           262  2017-07-03 06:37:15.000
**17723  d210c4        396           048  2017-07-03 06:37:19.000** 
17725  d21147          934           691  2017-07-03 06:37:24.000  
17726  d21167          934           691  2017-07-03 06:37:26.000 

Note: this is just the subset of the values in the dataframe that I have. It just sorted values and it did not grouped.

And I want to group and to sort time values in each group independently of others.

I also tried data1.sort_values(['time'], ascending=False).groupby('number_from') but the result is same. What I'm doing wrong?

Upvotes: 2

Views: 573

Answers (1)

jezrael
jezrael

Reputation: 862511

I believe you need sorting by multiple columns together, but first convert column time to datetimes:

data1['time'] = pd.to_datetime(data1['time'])
#if need to compare integers
data1['number_from'] = data1['number_from'].astype(int)

data1.sort_values(['number_from','time'])

EDIT:

Tested both solution, working same:

data1['time'] = pd.to_datetime(data1['time'])

df1 = data1.sort_values(['number_from','time'])
print (df1)
             id  number_from  number_to                time
Unnamed                                                    
17723    d210c4          396         48 2017-07-03 06:37:19
17703    d20c17          612        235 2017-07-03 06:36:28
17699    d20b3e          934        674 2017-07-03 06:36:20
17700    d20b81          934        674 2017-07-03 06:36:22
17701    d20b96          934        674 2017-07-03 06:36:23
17707    d20db5          934        658 2017-07-03 06:36:45
17708    d20de9          934        658 2017-07-03 06:36:47
17710    d20e05          934        658 2017-07-03 06:36:49
17711    d20e41          934        658 2017-07-03 06:36:51
17712    d20e73          934        658 2017-07-03 06:36:53
17713    d20ecc          934        702 2017-07-03 06:36:57
17714    d20ef1          934        702 2017-07-03 06:36:59
17715    d20f32          934        702 2017-07-03 06:37:01
17716    d20f77          934        702 2017-07-03 06:37:03
17717    d20f8d          934        702 2017-07-03 06:37:05
17718    d20fd8          934        262 2017-07-03 06:37:08
17719    d21017          934        262 2017-07-03 06:37:11
17720    d21032          934        262 2017-07-03 06:37:12
17721    d2103e          934        262 2017-07-03 06:37:13
17722    d2106d          934        262 2017-07-03 06:37:15
17725    d21147          934        691 2017-07-03 06:37:24
17726    d21167          934        691 2017-07-03 06:37:26

Added parameter group_keys=False) for avoid creating level from number_from column:

df2 = data1.groupby('number_from', group_keys=False).apply(lambda x: x.sort_values('time'))
print (df2)
             id  number_from  number_to                time
Unnamed                                                    
17723    d210c4          396         48 2017-07-03 06:37:19
17703    d20c17          612        235 2017-07-03 06:36:28
17699    d20b3e          934        674 2017-07-03 06:36:20
17700    d20b81          934        674 2017-07-03 06:36:22
17701    d20b96          934        674 2017-07-03 06:36:23
17707    d20db5          934        658 2017-07-03 06:36:45
17708    d20de9          934        658 2017-07-03 06:36:47
17710    d20e05          934        658 2017-07-03 06:36:49
17711    d20e41          934        658 2017-07-03 06:36:51
17712    d20e73          934        658 2017-07-03 06:36:53
17713    d20ecc          934        702 2017-07-03 06:36:57
17714    d20ef1          934        702 2017-07-03 06:36:59
17715    d20f32          934        702 2017-07-03 06:37:01
17716    d20f77          934        702 2017-07-03 06:37:03
17717    d20f8d          934        702 2017-07-03 06:37:05
17718    d20fd8          934        262 2017-07-03 06:37:08
17719    d21017          934        262 2017-07-03 06:37:11
17720    d21032          934        262 2017-07-03 06:37:12
17721    d2103e          934        262 2017-07-03 06:37:13
17722    d2106d          934        262 2017-07-03 06:37:15
17725    d21147          934        691 2017-07-03 06:37:24
17726    d21167          934        691 2017-07-03 06:37:26

print (df1.equals(df2))
True

Upvotes: 2

Related Questions