Reputation: 5384
This is a sort of an opposite question from ones I could find with similar wording, such as:
Say, I have this DataFrame:
import pandas as pd
df = pd.DataFrame({
'model': ['Punto', 'Doblo', 'Panda', 'Doblo','Punto', 'Tipo'] ,
'timestamp': ['20200124_083155', '20200124_122052', '20200124_134350', '20200124_150801', '20200124_163540', '20200124_195955']
})
print(df)
This prints out:
model timestamp
0 Punto 20200124_083155
1 Doblo 20200124_122052
2 Panda 20200124_134350
3 Doblo 20200124_150801
4 Punto 20200124_163540
5 Tipo 20200124_195955
What I would like to obtain is: first sort by timestamp; then in order of appearance in that sort, group by order of appearance - but without the extra "group" columns that the pandas .groupby
clause would add; that is, I would like to obtain the final output:
model timestamp
0 Punto 20200124_083155
1 Punto 20200124_163540
2 Doblo 20200124_122052
3 Doblo 20200124_150801
4 Panda 20200124_134350
5 Tipo 20200124_195955
How can I achieve that?
Upvotes: 1
Views: 1530
Reputation: 862511
I think this is possible by ordered categoricals, set order by sorted timestamp
values in first step and then sorting by both columns by DataFrame.sort_values
:
c = df.sort_values('timestamp')['model'].unique()
df['model'] = pd.Categorical(df['model'], ordered=True, categories=c)
df = df.sort_values(['model','timestamp'])
print (df)
model timestamp
0 Punto 20200124_083155
4 Punto 20200124_163540
1 Doblo 20200124_122052
3 Doblo 20200124_150801
2 Panda 20200124_134350
5 Tipo 20200124_195955
Upvotes: 2