Erik Gibbons
Erik Gibbons

Reputation: 13

Unexpected result of DataFrame sort

When I sort my DataFrame using several columns (['Symbol','Year','Month','Day']) the resulting DataFrame is sorted by Symbol > Year > Month but not by Day:

In [1]: df = pd.DataFrame({'Symbol': {79: 'F', 81: 'F', 82: 'F', 83: 'F', 84: 'F', 85: 'F', 86: 'F', 87: 'F', 89: 'F'}, 'Shares': {79: 100, 81: 100, 82: 100, 83: 100, 84: 100, 85: 100, 86: 100, 87: 100, 89: 100}, 'Month': {79: '08', 81: '08', 82: '08', 83: '08', 84: '08', 85: '08', 86: '08', 87: '08', 89: '09'}, 'Year': {79: '2008', 81: '2008', 82: '2008', 83: '2008', 84: '2008', 85: '2008', 86: '2008', 87: '2008', 89: '2008'}, 'Action': {79: 'Sell', 81: 'Sell', 82: 'Buy', 83: 'Sell', 84: 'Buy', 85: 'Sell', 86: 'Buy', 87: 'Sell', 89: 'Sell'}, 'Day': {79: 2L, 81: 4L, 82: '06', 83: 11L, 84: '13', 85: 18L, 86: '18', 87: 23L, 89: 22L}})

In [2]: df
Out[2]:
   Action Day Month  Shares Symbol  Year
79   Sell   2    08     100      F  2008
81   Sell   4    08     100      F  2008
82    Buy  06    08     100      F  2008
83   Sell  11    08     100      F  2008
84    Buy  13    08     100      F  2008
85   Sell  18    08     100      F  2008
86    Buy  18    08     100      F  2008
87   Sell  23    08     100      F  2008
89   Sell  22    09     100      F  2008

In [3]: df.sort(['Symbol','Year','Month','Day'])
Out[3]:
   Action Day Month  Shares Symbol  Year
79   Sell   2    08     100      F  2008
81   Sell   4    08     100      F  2008
83   Sell  11    08     100      F  2008
85   Sell  18    08     100      F  2008
87   Sell  23    08     100      F  2008
82    Buy  06    08     100      F  2008
84    Buy  13    08     100      F  2008
86    Buy  18    08     100      F  2008
89   Sell  22    09     100      F  2008

Why isn't sort working as expected?

Upvotes: 1

Views: 113

Answers (1)

Andy Hayden
Andy Hayden

Reputation: 375495

It's not working as you expect because the Days are stored as mixed type (strings and long), and since strings are "greater than" numbers in python (the sorting looks like it's acting unexpectedly).

You can convert this column to integers by apply-ing int:

df['Day'] = df['Day'].apply(int)

I'd consider doing this for month and year too, since in your DataFrame these are strings (and perhaps would make more sense as int):

df['Mo.'] = df['Mo.'].apply(int)
df['Year'] = df['Year'].apply(int)

Then you can sort by day:

In [11]: df.sort(['Day'])
Out[11]:
   Indx  Year  Mo.  Day Sym Action  Shares
0    79  2008    8    2   F   Sell     100
1    81  2008    8    4   F   Sell     100
5    82  2008    8    6   F    Buy     100
2    83  2008    8   11   F   Sell     100
6    84  2008    8   13   F    Buy     100
3    85  2008    8   18   F   Sell     100
7    86  2008    8   18   F    Buy     100
8    89  2008    9   22   F   Sell     100
4    87  2008    8   23   F   Sell     100

Or sort with multiple columns:

In [12]: df.sort(['Mo.', 'Day'])
Out[12]:
   Indx  Year  Mo.  Day Sym Action  Shares
0    79  2008    8    2   F   Sell     100
1    81  2008    8    4   F   Sell     100
5    82  2008    8    6   F    Buy     100
2    83  2008    8   11   F   Sell     100
6    84  2008    8   13   F    Buy     100
3    85  2008    8   18   F   Sell     100
7    86  2008    8   18   F    Buy     100
4    87  2008    8   23   F   Sell     100
8    89  2008    9   22   F   Sell     100

In [13]: df.sort(['Day', 'Mo.'])
Out[13]:
   Indx  Year  Mo.  Day Sym Action  Shares
0    79  2008    8    2   F   Sell     100
1    81  2008    8    4   F   Sell     100
5    82  2008    8    6   F    Buy     100
2    83  2008    8   11   F   Sell     100
6    84  2008    8   13   F    Buy     100
3    85  2008    8   18   F   Sell     100
7    86  2008    8   18   F    Buy     100
8    89  2008    9   22   F   Sell     100
4    87  2008    8   23   F   Sell     100

And with the ascending argument:

In [14]: df.sort(['Mo.', 'Day'], ascending=[True, False])
Out[14]:
   Indx  Year  Mo.  Day Sym Action  Shares
4    87  2008    8   23   F   Sell     100
3    85  2008    8   18   F   Sell     100
7    86  2008    8   18   F    Buy     100
6    84  2008    8   13   F    Buy     100
2    83  2008    8   11   F   Sell     100
5    82  2008    8    6   F    Buy     100
1    81  2008    8    4   F   Sell     100
0    79  2008    8    2   F   Sell     100
8    89  2008    9   22   F   Sell     100

... will work as expected.

Upvotes: 1

Related Questions