EP31121PJ
EP31121PJ

Reputation: 105

Sorting date data in pandas Series

The data looks like this:

0        Thursday
1        Thursday
2        Thursday
3        Thursday
etc, etc

My code:

import pandas as pd
data_file = pd.read_csv('./data/Chicago-2016-Summary.csv')
days = data_file['day_of_week']

order = ["Monday","Tuesday","Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]

sorted(days, key=lambda x: order.index(x[0]))
print(days)

This results in error:

ValueError: 'T' is not in list

I tried to sort and get this error but I have no idea what this means.

I just want to sort the data Monday-Sunday so I can do some visualizations. Any suggestions?

Upvotes: 2

Views: 441

Answers (1)

Brad Solomon
Brad Solomon

Reputation: 40878

You can use pandas' Categorical data type for this:

order = ["Monday","Tuesday","Wednesday", "Thursday", "Friday", "Saturday", "Sunday"] 
data_file['day_of_week'] = pd.Categorical(data_file['day_of_week'], categories=order, ordered=True)
data_file.sort_values(by='day_of_week', inplace=True)

In your example, be aware that when you specify

days = data_file['day_of_week']

you are creating a view to that column (Series) within your data_file frame. You may want to use days = data_file['day_of_week'].copy(). Or, just work within the DataFrame as is done above.

Upvotes: 3

Related Questions