Anmol Sureka
Anmol Sureka

Reputation: 23

Sort Pandas dataframe column index by date

I want to sort dataframe by column index. The issue is my columns are 'dates' dd/mm/yyyy directly imported from my excel. For ex:

    10/08/20  12/08/20 11/08/20
0   2.0        6.0       15.0
1   6.0        11.0      8.0
2   4.0        7.0       3.0
3   7.0        12.0      2.0
4   12.0       5.0       7.0

The output I want is:

    10/08/20  11/08/20 12/08/20
0   2.0        15.0      6.0
1   6.0        8.0       11.0
2   4.0        3.0       7.0
3   7.0        2.0       12.0
4   12.0       7.0       5.0

I am using

df.sort_index(axis=1)

It is giving me following error:

TypeError: '<' not supported between instances of 'datetime.datetime' and 'str'

I want to do it in panda dataframe. Any help will be appreciated. Thanks

Upvotes: 1

Views: 534

Answers (3)

Kuldip Chaudhari
Kuldip Chaudhari

Reputation: 1112

First remove the '.' at the end of date from the data shource sheet. the for this data

    10-08-2020  12-08-2020  11-08-2020
0   2           6           15
1   6           11          8
2   4           7           3
3   7           12          2
4   12          5           7

try this

import datetime as dt
df.columns=pd.Series(df.columns).apply(lambda d: dt.datetime(d, dt.datetime.strptime(d, '%d/%m/%Y')))
df.sort_index(axis = 1)

Upvotes: 2

user3103059
user3103059

Reputation:

Your error comes from the fact that you are mixing string types with date types. Either all your column names are strings or all are dates but you cannot have both mixed.

For example

l=[[2.0, 6.0, 15.0],
   [6.0, 11.0, 8.0],
   [4.0, 7.0, 3.0],
   [7.0, 12.0, 2.0],
   [12.0, 5.0, 7.0]]

d = pd.DataFrame(l, columns =['10/08/20',  '12/08/20', '11/08/20']) # column names are strings

yields

   10/08/20  12/08/20  11/08/20
0       2.0       6.0      15.0
1       6.0      11.0       8.0
2       4.0       7.0       3.0
3       7.0      12.0       2.0
4      12.0       5.0       7.0

Now if I want to sort by column names I type

d.sort_index(axis = 1)

   10/08/20  11/08/20  12/08/20
0       2.0      15.0       6.0
1       6.0       8.0      11.0
2       4.0       3.0       7.0
3       7.0       2.0      12.0
4      12.0       7.0       5.0

If on the other hand, column names were dates as in

from dateutil.parser import parse
d = pd.DataFrame(l, columns =[parse('10/08/20'),  parse('12/08/20'), parse('11/08/20')])

we will have

   2020-10-08  2020-12-08  2020-11-08   #now column names are dates
0         2.0         6.0        15.0
1         6.0        11.0         8.0
2         4.0         7.0         3.0
3         7.0        12.0         2.0
4        12.0         5.0         7.0

Again you can sort them using the same

details.sort_index(axis = 1)
  
   2020-10-08  2020-11-08  2020-12-08
0         2.0        15.0         6.0
1         6.0         8.0        11.0
2         4.0         3.0         7.0
3         7.0         2.0        12.0
4        12.0         7.0         5.0

and will give you no error.

Upvotes: 0

AtanuCSE
AtanuCSE

Reputation: 8940

First:

df.columns = df.columns.str.replace(".", "")

Then:

df.sort_index(axis = 1)

Update: as Ch3steR mentioned in the comments. For removing "."

df.columns = df.columns.str.rstrip(".")

Using str.rstrip to generalize, as day.month.year is valid format, using str.replace would replace every .

Example:

s = pd.Series(["1.2.2020."])
pd.to_datetime(s.str.replace('.', ''))
# 0   2020-12-20         # Interpeted wrong
# dtype: datetime64[ns]

pd.to_datetime(s.str.rstrip('.'))
# 0   2020-01-02
# dtype: datetime64[ns]

Upvotes: 1

Related Questions