Reputation: 249
How to sort pandas's dataframe by specific column names? My dataframe columns look like this:
+-------+-------+-----+------+------+----------+
|movieId| title |drama|horror|action| comedy |
+-------+-------+-----+------+------+----------+
| |
+-------+-------+-----+------+------+----------+
I would like to sort the dataframe only by columns = ['drama','horror','sci-fi','comedy']. So I get the following dataframe:
+-------+-------+------+------+------+----------+
|movieId| title |action|comedy|drama | horror |
+-------+-------+------+------+------+----------+
| |
+-------+-------+------+------+------+----------+
I tried df = df.sort_index(axis=1)
but it sorts all columns:
+-------+-------+------+------+-------+----------+
|action | comedy|drama |horror|movieId| title |
+-------+-------+------+------+-------+----------+
| |
+-------+-------+------+------+-------+----------+
Upvotes: 2
Views: 420
Reputation: 1
Another way would be set movieId
and title
as index of the DataFrame and then sort index by the remaining column.
df.set_index(['movieId', 'title'], inplace=True)
df.sort_index(axis=1, inplace=True)
Upvotes: 0
Reputation: 862851
Sorting all columns after second column and add first 2 columns:
c = df.columns[:2].tolist() + sorted(df.columns[2:].tolist())
print (c)
['movieId', 'title', 'action', 'comedy', 'drama', 'horror']
Last change order of columns by this list:
df1 = df[c]
Another idea is use DataFrame.sort_index
but only for all columns without first 2 selected by DataFrame.iloc
:
df.iloc[:, 2:] = df.iloc[:, 2:].sort_index(axis=1)
Upvotes: 1
Reputation: 634
You can explicitly rearrange columns like so
df[['movieId','title','drama','horror','sci-fi','comedy']]
If you have a lot of columns to sort alphabetically
df[np.concatenate([['movieId,title'],df.drop('movieId,title',axis=1).columns.sort_values()])]
Upvotes: 1