Reputation: 23
I want to sort dataframe by column index. The issue is my columns are 'dates' dd/mm/yyyy directly imported from my excel. For ex:
10/08/20 12/08/20 11/08/20
0 2.0 6.0 15.0
1 6.0 11.0 8.0
2 4.0 7.0 3.0
3 7.0 12.0 2.0
4 12.0 5.0 7.0
The output I want is:
10/08/20 11/08/20 12/08/20
0 2.0 15.0 6.0
1 6.0 8.0 11.0
2 4.0 3.0 7.0
3 7.0 2.0 12.0
4 12.0 7.0 5.0
I am using
df.sort_index(axis=1)
It is giving me following error:
TypeError: '<' not supported between instances of 'datetime.datetime' and 'str'
I want to do it in panda dataframe. Any help will be appreciated. Thanks
Upvotes: 1
Views: 534
Reputation: 1112
First remove the '.' at the end of date from the data shource sheet. the for this data
10-08-2020 12-08-2020 11-08-2020
0 2 6 15
1 6 11 8
2 4 7 3
3 7 12 2
4 12 5 7
try this
import datetime as dt
df.columns=pd.Series(df.columns).apply(lambda d: dt.datetime(d, dt.datetime.strptime(d, '%d/%m/%Y')))
df.sort_index(axis = 1)
Upvotes: 2
Reputation:
Your error comes from the fact that you are mixing string types with date types. Either all your column names are strings or all are dates but you cannot have both mixed.
For example
l=[[2.0, 6.0, 15.0],
[6.0, 11.0, 8.0],
[4.0, 7.0, 3.0],
[7.0, 12.0, 2.0],
[12.0, 5.0, 7.0]]
d = pd.DataFrame(l, columns =['10/08/20', '12/08/20', '11/08/20']) # column names are strings
yields
10/08/20 12/08/20 11/08/20
0 2.0 6.0 15.0
1 6.0 11.0 8.0
2 4.0 7.0 3.0
3 7.0 12.0 2.0
4 12.0 5.0 7.0
Now if I want to sort by column names I type
d.sort_index(axis = 1)
10/08/20 11/08/20 12/08/20
0 2.0 15.0 6.0
1 6.0 8.0 11.0
2 4.0 3.0 7.0
3 7.0 2.0 12.0
4 12.0 7.0 5.0
If on the other hand, column names were dates as in
from dateutil.parser import parse
d = pd.DataFrame(l, columns =[parse('10/08/20'), parse('12/08/20'), parse('11/08/20')])
we will have
2020-10-08 2020-12-08 2020-11-08 #now column names are dates
0 2.0 6.0 15.0
1 6.0 11.0 8.0
2 4.0 7.0 3.0
3 7.0 12.0 2.0
4 12.0 5.0 7.0
Again you can sort them using the same
details.sort_index(axis = 1)
2020-10-08 2020-11-08 2020-12-08
0 2.0 15.0 6.0
1 6.0 8.0 11.0
2 4.0 3.0 7.0
3 7.0 2.0 12.0
4 12.0 7.0 5.0
and will give you no error.
Upvotes: 0
Reputation: 8940
First:
df.columns = df.columns.str.replace(".", "")
Then:
df.sort_index(axis = 1)
Update: as Ch3steR mentioned in the comments. For removing "."
df.columns = df.columns.str.rstrip(".")
Using str.rstrip
to generalize, as day.month.year
is valid format, using str.replace
would replace every .
s = pd.Series(["1.2.2020."])
pd.to_datetime(s.str.replace('.', ''))
# 0 2020-12-20 # Interpeted wrong
# dtype: datetime64[ns]
pd.to_datetime(s.str.rstrip('.'))
# 0 2020-01-02
# dtype: datetime64[ns]
Upvotes: 1