Jasper
Jasper

Reputation: 2231

Pandas, sorting days whilst preserving order

I've received a CSV file that is a combination of several other csv files.

It has a datetime index (in the format of '2017-01-16' , year, month, day) However, two problems arise.

  1. The combination was not done in order.

            Date        string        number (different)
    1     2017-01-16   stringvalue     90
    2     2017-01-16   stringvalue    912  
    3     2017-01-16   stringvalue     29 
    4     2017-01-17   stringvalue    883
    5     2017-01-17   stringvalue    223
    6     2017-01-17   stringvalue    211
                   (...)
    230   2015-04-30   stringvalue      908
    231   2015-04-29   stringvalue       28
    232   2015-04-29   stringvalue        9
    233   2015-04-30   stringvalue       98
    234   2015-04-30   stringvalue      909
                    (...)
    450   2017-03-30   stringvalue  348
    
  2. No time has been provided (the actual day is the smallest number, yet each days holds around 10 values, that need to be kept in order)

I resolved the first problem by peforming

    df = df.reset_index()
    df = df.sort_values('Date')
    df = df.set_index('Date')

This correctly orders the index, but messes up the ordering within each day. Is there a way to sort the dates, but keep the original order within the days intact?

Upvotes: 1

Views: 2808

Answers (2)

Haleemur Ali
Haleemur Ali

Reputation: 28303

df['Date'] = pd.to_datetime(df.Date)

first convert the Date column to datetime type if needed

df = df.reset_index().sort_values(by=['Date', 'index']).drop(['index'], axis=1)

This will reset the index, creating a temporary column called index. Then sort using both the Date & index columns, and finally drop the index column. leaving the data frame sorted by Date and the order in which they appeared in the original CSV file.

Upvotes: 1

BENY
BENY

Reputation: 323316

By using a new para and prefix the original order :

df['G']=df.groupby(level='Date').cumcount()
df
Out[125]: 
                 string  number  G
Date                              
2017-01-16  stringvalue      90  0
2017-01-16  stringvalue     912  1
2017-01-16  stringvalue      29  2
2017-01-17  stringvalue     883  0
2017-01-17  stringvalue     223  1
2017-01-17  stringvalue     211  2
2015-04-30  stringvalue     908  0
2017-03-30  stringvalue     348  0

df.sort_values('G').sort_index().drop('G',1)
Out[124]: 
                 string  number
Date                           
2015-04-30  stringvalue     908
2017-01-16  stringvalue      90
2017-01-16  stringvalue     912
2017-01-16  stringvalue      29
2017-01-17  stringvalue     883
2017-01-17  stringvalue     223
2017-01-17  stringvalue     211
2017-03-30  stringvalue     348

Upvotes: 1

Related Questions