Reputation: 16935
I have the following DataFrame (df
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.rand(10, 5))
I add more column(s) by assignment:
df['mean'] = df.mean(1)
How can I move the column mean
to the front, i.e. set it as first column leaving the order of the other columns untouched?
Upvotes: 1673
Views: 2645117
Reputation: 23381
You can also use sort_index()
with a sorting key. Just like you can rearrange a list in any order you want by passing a sorting key to the built-in sorted()
function, you can also rearrange pandas columns using a sorting key. However, unlike the key in sorted()
, this key must be a vectorized function, which means it must return the whole list of positions at once.
For the current example where we want to move 'mean'
to front the shift all other columns to the right, we can do the following.
column to have the lowest integer value so that it can be treated to have the lowest key.sorting_key
function that maps the dictionary defined above to the columns.sorting_key
function as the sort_index()
key.df = pd.DataFrame(np.random.rand(3, 5))
df['mean'] = df.mean(1)
mapping = {col: pos for pos, col in enumerate(df.columns)} | {'mean': -1}
sorting_key = lambda cols:
df.sort_index(axis=1, key=sorting_key)
This outputs the following:
That said, we can always rearrange the columns outside of pandas and simply use __getitem__
a.k.a []
, reindex()
or get()
to reorder the columns accordingly. Once you reorder the column labels (outside of pandas) such as
cols = ['mean', *df.columns.drop('mean')]
# or
cols = ['mean'] + df.columns[:-1].tolist()
# or
mapping = {col: pos for pos, col in enumerate(df.columns)} | {'mean': -1}
cols = sorted(df.columns, key=mapping.get)
then it becomes a problem of selecting columns according to it (which is a similar question as this one).
Here are some column selection methods (most of which already exist in other answers on this page) that preserve the list order:
cols = ['mean', 0, 1, 2, 3, 4]
df = df[cols] # []
df = df.loc[:, cols] # .loc[]
df = df.reindex(columns=cols) # reindex()
df = df.get(cols) # get()
idx = [-1, 0, 1, 2, 3, 4]
df = df.take(idx, axis=1) # take()
df = df.iloc[:, idx] # iloc[]
Upvotes: 0
Reputation: 915
You can reorder the dataframe columns using a list of names with:
df = df.filter(['list', 'of', 'column', 'names'])
Upvotes: 38
Reputation: 105651
For pandas >= 1.3 (Edited in 2022):
df.insert(0, 'mean', df.pop('mean'))
How about (for Pandas < 1.3, the original answer)
df.insert(0, 'mean', df['mean'])
Upvotes: 391
Reputation: 2948
Here's an example of a super easy way to do it. If you're copying the headers from excel use .split('\t')
Upvotes: 3
Reputation: 845
Here's a way to move one existing column that will modify the existing dataframe in place.
my_column = df.pop('column name')
df.insert(3,, my_column) # Is in-place
Upvotes: 21
Reputation: 39930
Another option would be to use set_index()
method followed by a reset_index()
. Note that we first pop()
the column we intend to move to the front of the dataframe, so that we avoid name collision when resetting the index:
df.set_index(df.pop('column_name'), inplace=True)
For more details see How to change the order of dataframe columns in pandas.
Upvotes: 0
Reputation: 149
I tried making a order function which you can reorder/move column(s) with reference of order command of Stata. it would be better to make a py file( the name of which may be and save it in a directory and call it function
def order(dataframe,cols,f_or_l=None,before=None, after=None):
#만든이: 김완석, Stata로 뚝딱뚝딱 저자, 운영
# 갖다 쓰시거나 수정을 하셔도 되지만 출처는 꼭 밝혀주세요
# cols옵션 및 befor/after옵션에 튜플이 가능하게끔 수정했으며, 오류문구 수정함(2021.07.12,1)
# 칼럼이 멀티인덱스인 상태에서 reset_index()메소드 사용했을 시 적용안되는 걸 수정함(2021.07.12,2)
import pandas as pd
if (type(cols)==str) or (type(cols)==int) or (type(cols)==float) or (type(cols)==bool) or type(cols)==tuple:
for i in cols:
dd.remove(i) #cols요소를 제거함
if (f_or_l==None) & ((before==None) & (after==None)):
print('f_or_l옵션을 쓰시거나 아니면 before옵션/after옵션 쓰셔야되요')
if ((f_or_l=='first') or (f_or_l=='last')) & ~((before==None) & (after==None)):
print('f_or_l옵션 사용시 before after 옵션 사용불가입니다.')
if (f_or_l=='first') & (before==None) & (after==None):
return dataframe
if (f_or_l=='last') & (before==None) & (after==None):
return dataframe
if (before!=None) & (after!=None):
print('before옵션 after옵션 둘다 쓸 수 없습니다.')
if (before!=None) & (after==None) & (f_or_l==None):
if not((type(before)==str) or (type(before)==int) or (type(before)==float) or
(type(before)==bool) or ((type(before)!=list)) or
print('before옵션은 칼럼 하나만 입력가능하며 리스트 형태로도 입력하지 마세요.')
return dataframe
if (after!=None) & (before==None) & (f_or_l==None):
if not((type(after)==str) or (type(after)==int) or (type(after)==float) or
(type(after)==bool) or ((type(after)!=list)) or
print('after옵션은 칼럼 하나만 입력가능하며 리스트 형태로도 입력하지 마세요.')
return dataframe
python code below is an example of order function I made. I hope you can reorder column(s) so easily with my order function :)
# module
import pandas as pd
import numpy as np
from order import order # call order function from file
# make a dataset
columns='a b c d e f g h i j k'.split()
for i in columns:
# use order function (1) : order column e in the first
# use order function (2): order column e in the last , "data" dataframe
# use order function (3) : order column i before column c in "data" dataframe
# use order function (4) : order column g after column b in "data" dataframe
# use order function (4) : order columns ['c', 'd', 'e'] after column i in "data" dataframe
print(order(data,['c', 'd', 'e'],after='i'))
Upvotes: -1
Reputation: 117
I thought of the same as Dmitriy Work, clearly easiest answer:
df["mean"] = df.mean(1)
l = list(np.arange(0,len(df.columns) -1 ))
Upvotes: 0
Reputation: 177
Similar to the top answer, there is an alternative using deque() and its rotate() method. The rotate method takes the last element in the list and inserts it to the beginning:
from collections import deque
columns = deque(df.columns.tolist())
df = df[columns]
Upvotes: 1
Reputation: 6101
If your column names are too-long-to-type then you could specify the new order through a list of integers with the positions:
0 1 2 3 4 mean
0 0.397312 0.361846 0.719802 0.575223 0.449205 0.500678
1 0.287256 0.522337 0.992154 0.584221 0.042739 0.485741
2 0.884812 0.464172 0.149296 0.167698 0.793634 0.491923
3 0.656891 0.500179 0.046006 0.862769 0.651065 0.543382
4 0.673702 0.223489 0.438760 0.468954 0.308509 0.422683
5 0.764020 0.093050 0.100932 0.572475 0.416471 0.389390
6 0.259181 0.248186 0.626101 0.556980 0.559413 0.449972
7 0.400591 0.075461 0.096072 0.308755 0.157078 0.207592
8 0.639745 0.368987 0.340573 0.997547 0.011892 0.471749
9 0.050582 0.714160 0.168839 0.899230 0.359690 0.438500
Generic example:
new_order = [3,2,1,4,5,0]
3 2 1 4 mean 0
0 0.575223 0.719802 0.361846 0.449205 0.500678 0.397312
1 0.584221 0.992154 0.522337 0.042739 0.485741 0.287256
2 0.167698 0.149296 0.464172 0.793634 0.491923 0.884812
3 0.862769 0.046006 0.500179 0.651065 0.543382 0.656891
4 0.468954 0.438760 0.223489 0.308509 0.422683 0.673702
5 0.572475 0.100932 0.093050 0.416471 0.389390 0.764020
6 0.556980 0.626101 0.248186 0.559413 0.449972 0.259181
7 0.308755 0.096072 0.075461 0.157078 0.207592 0.400591
8 0.997547 0.340573 0.368987 0.011892 0.471749 0.639745
9 0.899230 0.168839 0.714160 0.359690 0.438500 0.050582
Although it might seem like I'm just explicitly typing the column names in a different order, the fact that there's a column 'mean' should make it clear that new_order
relates to actual positions and not column names.
For the specific case of OP's question:
new_order = [-1,0,1,2,3,4]
df = df[df.columns[new_order]]
mean 0 1 2 3 4
0 0.500678 0.397312 0.361846 0.719802 0.575223 0.449205
1 0.485741 0.287256 0.522337 0.992154 0.584221 0.042739
2 0.491923 0.884812 0.464172 0.149296 0.167698 0.793634
3 0.543382 0.656891 0.500179 0.046006 0.862769 0.651065
4 0.422683 0.673702 0.223489 0.438760 0.468954 0.308509
5 0.389390 0.764020 0.093050 0.100932 0.572475 0.416471
6 0.449972 0.259181 0.248186 0.626101 0.556980 0.559413
7 0.207592 0.400591 0.075461 0.096072 0.308755 0.157078
8 0.471749 0.639745 0.368987 0.340573 0.997547 0.011892
9 0.438500 0.050582 0.714160 0.168839 0.899230 0.359690
The main problem with this approach is that calling the same code multiple times will create different results each time, so one needs to be careful :)
Upvotes: 67
Reputation: 221
I wanted to bring two columns in front from a dataframe where I do not know exactly the names of all columns, because they are generated from a pivot statement before. So, if you are in the same situation: To bring columns in front that you know the name of and then let them follow by "all the other columns", I came up with the following general solution:
df = df.reindex_axis(['Col1','Col2'] + list(df.columns.drop(['Col1','Col2'])), axis=1)
Upvotes: 8
Reputation: 106
A pretty straightforward solution that worked for me is to use .reindex
on df.columns
df = df[df.columns.reindex(['mean', 0, 1, 2, 3, 4])[0]]
Upvotes: 4
Reputation: 15730
This question has been answered before but reindex_axis
is deprecated now so I would suggest to use:
df = df.reindex(sorted(df.columns), axis=1)
For those who want to specify the order they want instead of just sorting them, here's the solution spelled out:
df = df.reindex(['the','order','you','want'], axis=1)
Now, how you want to sort the list of column names is really not a pandas
question, that's a Python list manipulation question. There are many ways of doing that, and I think this answer has a very neat way of doing it.
Upvotes: 70
Reputation: 1672
I think this is a slightly neater solution:
df.insert(0, 'mean', df.pop("mean"))
This solution is somewhat similar to @JoeHeffer 's solution but this is one liner.
Here we remove the column "mean"
from the dataframe and attach it to index 0
with the same column name.
Upvotes: 29
Reputation: 1197
Suppose you have df
with columns A
The most simple way is:
df = df.reindex(['B','C','A'], axis=1)
Upvotes: 75
Reputation: 792
Hackiest method in the book
df.insert(0, "test", df["mean"])
df = df.drop(columns=["mean"]).rename(columns={"test": "mean"})
Upvotes: 4
Reputation: 2853
A simple approach is using set()
, in particular when you have a long list of columns and do not want to handle them manually:
cols = list(set(df.columns.tolist()) - set(['mean']))
cols.insert(0, 'mean')
df = df[cols]
Upvotes: 3
Reputation: 386
To set an existing column right/left of another, based on their names:
def df_move_column(df, col_to_move, col_left_of_destiny="", right_of_col_bool=True):
cols = list(df.columns.values)
index_max = len(cols) - 1
if not right_of_col_bool:
# set left of a column "c", is like putting right of column previous to "c"
# ... except if left of 1st column, then recursive call to set rest right to it
aux = cols.index(col_left_of_destiny)
if not aux:
for g in [x for x in cols[::-1] if x != col_to_move]:
df = df_move_column(
return df
col_left_of_destiny = cols[aux - 1]
index_old = cols.index(col_to_move)
index_new = 0
if len(col_left_of_destiny):
index_new = cols.index(col_left_of_destiny) + 1
if index_old == index_new:
return df
if index_new < index_old:
index_new = np.min([index_new, index_max])
cols = (
+ [cols[index_old]]
+ cols[index_new:index_old]
+ cols[index_old + 1 :]
cols = (
+ cols[index_old + 1 : index_new]
+ [cols[index_old]]
+ cols[index_new:]
df = df[cols]
return df
cols = list("ABCD")
df2 = pd.DataFrame(np.arange(4)[np.newaxis, :], columns=cols)
for k in cols:
print(30 * "-")
for g in [x for x in cols if x != k]:
df_new = df_move_column(df2, k, g)
print(f"{k} after {g}: {df_new.columns.values}")
for k in cols:
print(30 * "-")
for g in [x for x in cols if x != k]:
df_new = df_move_column(df2, k, g, right_of_col_bool=False)
print(f"{k} before {g}: {df_new.columns.values}")
Upvotes: 0
Reputation: 331
You can use a set which is an unordered collection of unique elements to do keep the "order of the other columns untouched":
other_columns = list(set(df.columns).difference(["mean"])) #[0, 1, 2, 3, 4]
Then, you can use a lambda to move a specific column to the front by:
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: df = pd.DataFrame(np.random.rand(10, 5))
In [4]: df["mean"] = df.mean(1)
In [5]: move_col_to_front = lambda df, col: df[[col]+list(set(df.columns).difference([col]))]
In [6]: move_col_to_front(df, "mean")
mean 0 1 2 3 4
0 0.697253 0.600377 0.464852 0.938360 0.945293 0.537384
1 0.609213 0.703387 0.096176 0.971407 0.955666 0.319429
2 0.561261 0.791842 0.302573 0.662365 0.728368 0.321158
3 0.518720 0.710443 0.504060 0.663423 0.208756 0.506916
4 0.616316 0.665932 0.794385 0.163000 0.664265 0.793995
5 0.519757 0.585462 0.653995 0.338893 0.714782 0.305654
6 0.532584 0.434472 0.283501 0.633156 0.317520 0.994271
7 0.640571 0.732680 0.187151 0.937983 0.921097 0.423945
8 0.562447 0.790987 0.200080 0.317812 0.641340 0.862018
9 0.563092 0.811533 0.662709 0.396048 0.596528 0.348642
In [7]: move_col_to_front(df, 2)
2 0 1 3 4 mean
0 0.938360 0.600377 0.464852 0.945293 0.537384 0.697253
1 0.971407 0.703387 0.096176 0.955666 0.319429 0.609213
2 0.662365 0.791842 0.302573 0.728368 0.321158 0.561261
3 0.663423 0.710443 0.504060 0.208756 0.506916 0.518720
4 0.163000 0.665932 0.794385 0.664265 0.793995 0.616316
5 0.338893 0.585462 0.653995 0.714782 0.305654 0.519757
6 0.633156 0.434472 0.283501 0.317520 0.994271 0.532584
7 0.937983 0.732680 0.187151 0.921097 0.423945 0.640571
8 0.317812 0.790987 0.200080 0.641340 0.862018 0.562447
9 0.396048 0.811533 0.662709 0.596528 0.348642 0.563092
Upvotes: 6
Reputation: 809
You can do that after you added the 'n' column into your df as follows.
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.rand(10, 5))
df['mean'] = df.mean(1)
0 1 2 3 4 mean
0 0.929616 0.316376 0.183919 0.204560 0.567725 0.440439
1 0.595545 0.964515 0.653177 0.748907 0.653570 0.723143
2 0.747715 0.961307 0.008388 0.106444 0.298704 0.424512
3 0.656411 0.809813 0.872176 0.964648 0.723685 0.805347
4 0.642475 0.717454 0.467599 0.325585 0.439645 0.518551
5 0.729689 0.994015 0.676874 0.790823 0.170914 0.672463
6 0.026849 0.800370 0.903723 0.024676 0.491747 0.449473
7 0.526255 0.596366 0.051958 0.895090 0.728266 0.559587
8 0.818350 0.500223 0.810189 0.095969 0.218950 0.488736
9 0.258719 0.468106 0.459373 0.709510 0.178053 0.414752
### here you can add below line and it should work
# Don't forget the two (()) 'brackets' around columns names.Otherwise, it'll give you an error.
df = df[list(('mean',0, 1, 2,3,4))]
mean 0 1 2 3 4
0 0.440439 0.929616 0.316376 0.183919 0.204560 0.567725
1 0.723143 0.595545 0.964515 0.653177 0.748907 0.653570
2 0.424512 0.747715 0.961307 0.008388 0.106444 0.298704
3 0.805347 0.656411 0.809813 0.872176 0.964648 0.723685
4 0.518551 0.642475 0.717454 0.467599 0.325585 0.439645
5 0.672463 0.729689 0.994015 0.676874 0.790823 0.170914
6 0.449473 0.026849 0.800370 0.903723 0.024676 0.491747
7 0.559587 0.526255 0.596366 0.051958 0.895090 0.728266
8 0.488736 0.818350 0.500223 0.810189 0.095969 0.218950
9 0.414752 0.258719 0.468106 0.459373 0.709510 0.178053
Upvotes: 6
Reputation: 5273
Just flipping helps often.
Or just shuffle for a look.
import random
cols = list(df.columns)
Upvotes: 5
Reputation: 1263
I think this function is more straightforward. You Just need to specify a subset of columns at the start or the end or both:
def reorder_df_columns(df, start=None, end=None):
This function reorder columns of a DataFrame.
It takes columns given in the list `start` and move them to the left.
Its also takes columns in `end` and move them to the right.
if start is None:
start = []
if end is None:
end = []
assert isinstance(start, list) and isinstance(end, list)
cols = list(df.columns)
for c in start:
if c not in cols:
for c in end:
if c not in cols or c in start:
for c in start + end:
cols = start + cols + end
return df[cols]
Upvotes: 1
Reputation: 485
I have a very specific use case for re-ordering column names in pandas. Sometimes I am creating a new column in a dataframe that is based on an existing column. By default pandas will insert my new column at the end, but I want the new column to be inserted next to the existing column it's derived from.
def rearrange_list(input_list, input_item_to_move, input_item_insert_here):
Helper function to re-arrange the order of items in a list.
Useful for moving column in pandas dataframe.
input_list - list
input_item_to_move - item in list to move
input_item_insert_here - item in list, insert before
# make copy for output, make sure it's a list
output_list = list(input_list)
# index of item to move
idx_move = output_list.index(input_item_to_move)
# pop off the item to move
itm_move = output_list.pop(idx_move)
# index of item to insert here
idx_insert = output_list.index(input_item_insert_here)
# insert item to move into here
output_list.insert(idx_insert, itm_move)
return output_list
import pandas as pd
# step 1: create sample dataframe
df = pd.DataFrame({
'motorcycle': ['motorcycle1', 'motorcycle2', 'motorcycle3'],
'initial_odometer': [101, 500, 322],
'final_odometer': [201, 515, 463],
'other_col_1': ['blah', 'blah', 'blah'],
'other_col_2': ['blah', 'blah', 'blah']
print('Step 1: create sample dataframe')
# step 2: add new column that is difference between final and initial
df['change_odometer'] = df['final_odometer']-df['initial_odometer']
print('Step 2: add new column')
# step 3: rearrange columns
ls_cols = df.columns
ls_cols = rearrange_list(ls_cols, 'change_odometer', 'final_odometer')
print('Step 3: rearrange columns')
Upvotes: 1
Reputation: 109696
You need to create a new list of your columns in the desired order, then use df = df[cols]
to rearrange the columns in this new order.
cols = ['mean'] + [col for col in df if col != 'mean']
df = df[cols]
You can also use a more general approach. In this example, the last column (indicated by -1) is inserted as the first column.
cols = [df.columns[-1]] + [col for col in df if col != df.columns[-1]]
df = df[cols]
You can also use this approach for reordering columns in a desired order if they are present in the DataFrame.
inserted_cols = ['a', 'b', 'c']
cols = ([col for col in inserted_cols if col in df]
+ [col for col in df if col not in inserted_cols])
df = df[cols]
Upvotes: 83
Reputation: 13349
import numpy as np
import pandas as pd
df = pd.DataFrame()
column_names = ['x','y','z','mean']
for col in column_names:
df[col] = np.random.randint(0,100, size=10000)
You can try out the following solutions :
Solution 1:
df = df[ ['mean'] + [ col for col in df.columns if col != 'mean' ] ]
Solution 2:
df = df[['mean', 'x', 'y', 'z']]
Solution 3:
col = df.pop("mean")
df = df.insert(0,, col)
Solution 4:
df.set_index(df.columns[-1], inplace=True)
Solution 5:
cols = list(df)
cols = [cols[-1]] + cols[:-1]
df = df[cols]
solution 6:
order = [1,2,3,0] # setting column's order
df = df[[df.columns[i] for i in order]]
Solution 1:
CPU times: user 1.05 ms, sys: 35 µs, total: 1.08 ms Wall time: 995 µs
Solution 2:
CPU times: user 933 µs, sys: 0 ns, total: 933 µs Wall time: 800 µs
Solution 3:
CPU times: user 0 ns, sys: 1.35 ms, total: 1.35 ms Wall time: 1.08 ms
Solution 4:
CPU times: user 1.23 ms, sys: 45 µs, total: 1.27 ms Wall time: 986 µs
Solution 5:
CPU times: user 1.09 ms, sys: 19 µs, total: 1.11 ms Wall time: 949 µs
Solution 6:
CPU times: user 955 µs, sys: 34 µs, total: 989 µs Wall time: 859 µs
Upvotes: 131
Reputation: 335
Most of the answers did not generalize enough and pandas reindex_axis method is a little tedious, hence I offer a simple function to move an arbitrary number of columns to any position using a dictionary where key = column name and value = position to move to. If your dataframe is large pass True to 'big_data' then the function will return the ordered columns list. And you could use this list to slice your data.
def order_column(df, columns, big_data = False):
"""Re-Orders dataFrame column(s)
Parameters :
df -- dataframe
columns -- a dictionary:
key = current column position/index or column name
value = position to move it to
big_data -- boolean
True = returns only the ordered columns as a list
the user user can then slice the data using this
ordered column
False = default - return a copy of the dataframe
ordered_col = df.columns.tolist()
for key, value in columns.items():
ordered_col.insert(value, key)
if big_data:
return ordered_col
return df[ordered_col]
# e.g.
df = pd.DataFrame({'chicken wings': np.random.rand(10, 1).flatten(), 'taco': np.random.rand(10,1).flatten(),
'coffee': np.random.rand(10, 1).flatten()})
df['mean'] = df.mean(1)
df = order_column(df, {'mean': 0, 'coffee':1 })
col = order_column(df, {'mean': 0, 'coffee':1 }, True)
['mean', 'coffee', 'chicken wings', 'taco']
# you could grab it by doing this
df = df[col]
Upvotes: 2
Reputation: 9753
In your case,
df = df.reindex(columns=['mean',0,1,2,3,4])
will do exactly what you want.
In my case (general form):
df = df.reindex(columns=sorted(df.columns))
df = df.reindex(columns=(['opened'] + list([a for a in df.columns if a != 'opened']) ))
Upvotes: 212
Reputation: 1366
I ran into a similar question myself, and just wanted to add what I settled on. I liked the reindex_axis() method
for changing column order. This worked:
df = df.reindex_axis(['mean'] + list(df.columns[:-1]), axis=1)
An alternate method based on the comment from @Jorge:
df = df.reindex(columns=['mean'] + list(df.columns[:-1]))
Although reindex_axis
seems to be slightly faster in micro benchmarks than reindex
, I think I prefer the latter for its directness.
Upvotes: 23