Reputation: 13
When I sort my DataFrame using several columns (['Symbol','Year','Month','Day']
) the resulting DataFrame is sorted by Symbol > Year > Month
but not by Day
:
In [1]: df = pd.DataFrame({'Symbol': {79: 'F', 81: 'F', 82: 'F', 83: 'F', 84: 'F', 85: 'F', 86: 'F', 87: 'F', 89: 'F'}, 'Shares': {79: 100, 81: 100, 82: 100, 83: 100, 84: 100, 85: 100, 86: 100, 87: 100, 89: 100}, 'Month': {79: '08', 81: '08', 82: '08', 83: '08', 84: '08', 85: '08', 86: '08', 87: '08', 89: '09'}, 'Year': {79: '2008', 81: '2008', 82: '2008', 83: '2008', 84: '2008', 85: '2008', 86: '2008', 87: '2008', 89: '2008'}, 'Action': {79: 'Sell', 81: 'Sell', 82: 'Buy', 83: 'Sell', 84: 'Buy', 85: 'Sell', 86: 'Buy', 87: 'Sell', 89: 'Sell'}, 'Day': {79: 2L, 81: 4L, 82: '06', 83: 11L, 84: '13', 85: 18L, 86: '18', 87: 23L, 89: 22L}})
In [2]: df
Out[2]:
Action Day Month Shares Symbol Year
79 Sell 2 08 100 F 2008
81 Sell 4 08 100 F 2008
82 Buy 06 08 100 F 2008
83 Sell 11 08 100 F 2008
84 Buy 13 08 100 F 2008
85 Sell 18 08 100 F 2008
86 Buy 18 08 100 F 2008
87 Sell 23 08 100 F 2008
89 Sell 22 09 100 F 2008
In [3]: df.sort(['Symbol','Year','Month','Day'])
Out[3]:
Action Day Month Shares Symbol Year
79 Sell 2 08 100 F 2008
81 Sell 4 08 100 F 2008
83 Sell 11 08 100 F 2008
85 Sell 18 08 100 F 2008
87 Sell 23 08 100 F 2008
82 Buy 06 08 100 F 2008
84 Buy 13 08 100 F 2008
86 Buy 18 08 100 F 2008
89 Sell 22 09 100 F 2008
Why isn't sort
working as expected?
Upvotes: 1
Views: 113
Reputation: 375495
It's not working as you expect because the Days are stored as mixed type (strings and long), and since strings are "greater than" numbers in python (the sorting looks like it's acting unexpectedly).
You can convert this column to integers by apply
-ing int
:
df['Day'] = df['Day'].apply(int)
I'd consider doing this for month and year too, since in your DataFrame these are strings (and perhaps would make more sense as int):
df['Mo.'] = df['Mo.'].apply(int)
df['Year'] = df['Year'].apply(int)
Then you can sort
by day:
In [11]: df.sort(['Day'])
Out[11]:
Indx Year Mo. Day Sym Action Shares
0 79 2008 8 2 F Sell 100
1 81 2008 8 4 F Sell 100
5 82 2008 8 6 F Buy 100
2 83 2008 8 11 F Sell 100
6 84 2008 8 13 F Buy 100
3 85 2008 8 18 F Sell 100
7 86 2008 8 18 F Buy 100
8 89 2008 9 22 F Sell 100
4 87 2008 8 23 F Sell 100
Or sort with multiple columns:
In [12]: df.sort(['Mo.', 'Day'])
Out[12]:
Indx Year Mo. Day Sym Action Shares
0 79 2008 8 2 F Sell 100
1 81 2008 8 4 F Sell 100
5 82 2008 8 6 F Buy 100
2 83 2008 8 11 F Sell 100
6 84 2008 8 13 F Buy 100
3 85 2008 8 18 F Sell 100
7 86 2008 8 18 F Buy 100
4 87 2008 8 23 F Sell 100
8 89 2008 9 22 F Sell 100
In [13]: df.sort(['Day', 'Mo.'])
Out[13]:
Indx Year Mo. Day Sym Action Shares
0 79 2008 8 2 F Sell 100
1 81 2008 8 4 F Sell 100
5 82 2008 8 6 F Buy 100
2 83 2008 8 11 F Sell 100
6 84 2008 8 13 F Buy 100
3 85 2008 8 18 F Sell 100
7 86 2008 8 18 F Buy 100
8 89 2008 9 22 F Sell 100
4 87 2008 8 23 F Sell 100
And with the ascending
argument:
In [14]: df.sort(['Mo.', 'Day'], ascending=[True, False])
Out[14]:
Indx Year Mo. Day Sym Action Shares
4 87 2008 8 23 F Sell 100
3 85 2008 8 18 F Sell 100
7 86 2008 8 18 F Buy 100
6 84 2008 8 13 F Buy 100
2 83 2008 8 11 F Sell 100
5 82 2008 8 6 F Buy 100
1 81 2008 8 4 F Sell 100
0 79 2008 8 2 F Sell 100
8 89 2008 9 22 F Sell 100
... will work as expected.
Upvotes: 1