Reputation: 5071
I am puzzled with the behavior of sort_values() in Pandas which does not seem to respond appropriately to the axis argument.
For a toy example:
toy.to_json()
'{"labels":{"0":7,"1":4,"2":7,"3":1,"4":5,"5":0,"6":3,"7":1,"8":4,"9":9},"companies":{"0":"Apple","1":"AIG","2":"Amazon","3":"American express","4":"Boeing","5":"Bank of America","6":"British American Tobacco","7":"Canon","8":"Caterpillar","9":"Colgate-Palmolive"}}'
toy.sort_values('labels') # this works alright
labels companies
5 0 Bank of America
3 1 American express
7 1 Canon
6 3 British American Tobacco
1 4 AIG
8 4 Caterpillar
4 5 Boeing
0 7 Apple
2 7 Amazon
9 9 Colgate-Palmolive
toy.sort_values(by = 'labels', axis = 1) # Returns an exception
KeyError: 'labels'
Upvotes: 0
Views: 650
Reputation: 8826
Just to get understanding around axis and rows to clear when we choose axis=1
or axis=0
.
df.shape[0] # gives number of row count
df.shape[1] # gives number of col count
Let's assume a dataFrame as follow:
>>> df = pd.DataFrame({
... 'col1' : ['A', 'A', 'B', np.nan, 'D', 'C'],
... 'col2' : [2, 1, 9, 8, 7, 4],
... 'col3': [0, 1, 9, 4, 2, 3],
... })
>>> df
col1 col2 col3
0 A 2 0
1 A 1 1
2 B 9 9
3 NaN 8 4
4 D 7 2
5 C 4 3
So, applying the df.shape and see how it turns around the columns & rows:
>>> df.shape[0]
6 <-- Here, we have six row into the dataFrame
>>> df.shape[1]
3 <-- Here, we have three columns into the dataFrame
Now if you are just sorting the value by column name hence you don't need to specify axis=1
because column name already been specified, you can do simply :
>>> df.sort_values(by=['col1'])
col1 col2 col3
0 A 2 0
1 A 1 1
2 B 9 9
5 C 4 3
4 D 7 2
3 NaN 8 4
or, you can pass multiple column names as a list with by
:
>>> df.sort_values(by=['col1', 'col2'])
col1 col2 col3
1 A 1 1
0 A 2 0
2 B 9 9
5 C 4 3
4 D 7 2
3 NaN 8 4
Upvotes: 0
Reputation: 75100
Adding on an example to the above comments and answers:
Lets assume you had a dataframe as below:
df = pd.DataFrame(data={"labels":{"0":7,"1":4,"2":7,"3":1,"4":5},"companies":{"0":9,"1":1,"2":6,"3":1,"4":8}})
>>df
labels companies
0 7 9
1 4 1
2 7 6
3 1 1
4 5 8
For axis=0
, it would sort when you pass a index levels and/or column labels as:
df.sort_values(by='labels')
which gives you a sorted label
column (ascending by default).
labels companies
3 1 1
1 4 1
4 5 8
0 7 9
2 7 6
Coming to axis=1
, refer to the below code:
df.sort_values('4',axis=1)
This will sort the columns in a way the index 4
is sorted. Here it wont change anything since for index 4
since 5
is less than 8
and by default the sorting is ascending
. However if you execute df.sort_values('1',axis=1)
where the value under label
is more than companies
, you will see that the position of labels
and companies
has been exchanged.
companies labels
0 9 7
1 1 4
2 6 7
3 1 1
4 8 5
Hope this clarifies.
Upvotes: 1
Reputation: 5727
This is because axis 0 is "down" in your example, and 1 is "right" (that is, across columns)
If you look at the documentation for sort_values, you see that the first argument is indeed by
, and the default vaule for axis
is 0.
So your repeat your first example, you need to execute toy.sort_values(by='labels', axis=0)
Upvotes: 1