Reputation: 16935

How to change the order of DataFrame columns?

I have the following DataFrame (df):

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.rand(10, 5))

I add more column(s) by assignment:

df['mean'] = df.mean(1)

How can I move the column mean to the front, i.e. set it as first column leaving the order of the other columns untouched?

Upvotes: 1673

Answers (30)

Waldeyr Mendes da Silva

Reputation: 1165

df = df.reindex(columns=["second", "first"])

Upvotes: 3

cottontail

Reputation: 23381

Sort columns (inside pandas) using `sort_index()`

You can also use sort_index() with a sorting key. Just like you can rearrange a list in any order you want by passing a sorting key to the built-in sorted() function, you can also rearrange pandas columns using a sorting key. However, unlike the key in sorted(), this key must be a vectorized function, which means it must return the whole list of positions at once.

For the current example where we want to move 'mean' to front the shift all other columns to the right, we can do the following.

Create a dictionary where each column label has a corresponding integer value. We want 'mean' column to have the lowest integer value so that it can be treated to have the lowest key.
Define sorting_key function that maps the dictionary defined above to the columns.
Pass sorting_key function as the sort_index() key.

df = pd.DataFrame(np.random.rand(3, 5))
df['mean'] = df.mean(1)
mapping = {col: pos for pos, col in enumerate(df.columns)} | {'mean': -1}
sorting_key = lambda cols: cols.map(mapping)
df.sort_index(axis=1, key=sorting_key)

This outputs the following:

Sort columns outside of pandas and select columns

That said, we can always rearrange the columns outside of pandas and simply use __getitem__ a.k.a [], reindex() or get() to reorder the columns accordingly. Once you reorder the column labels (outside of pandas) such as

cols = ['mean', *df.columns.drop('mean')]

# or
cols = ['mean'] + df.columns[:-1].tolist()

# or
mapping = {col: pos for pos, col in enumerate(df.columns)} | {'mean': -1}
cols = sorted(df.columns, key=mapping.get)

then it becomes a problem of selecting columns according to it (which is a similar question as this one).

Here are some column selection methods (most of which already exist in other answers on this page) that preserve the list order:

label-based:

cols = ['mean', 0, 1, 2, 3, 4]
df = df[cols]                          # []
df = df.loc[:, cols]                   # .loc[]
df = df.reindex(columns=cols)          # reindex()
df = df.get(cols)                      # get()

index-based

idx = [-1, 0, 1, 2, 3, 4]
df = df.take(idx, axis=1)              # take()
df = df.iloc[:, idx]                   # iloc[]

Upvotes: 0

Sam Murphy

Reputation: 915

You can reorder the dataframe columns using a list of names with:

df = df.filter(['list', 'of', 'column', 'names'])

pandas .filter docs

Upvotes: 38

Wes McKinney

Reputation: 105651

For pandas >= 1.3 (Edited in 2022):

df.insert(0, 'mean', df.pop('mean'))

How about (for Pandas < 1.3, the original answer)

df.insert(0, 'mean', df['mean'])

https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html#column-selection-addition-deletion

Upvotes: 391

Chadee Fouad

Reputation: 2948

Here's an example of a super easy way to do it. If you're copying the headers from excel use .split('\t')

df = df['FILE_NAME DISPLAY_PATH SHAREPOINT_PATH RETAILER LAST_UPDATE'.split()]

Upvotes: 3

Joe Heffer

Reputation: 845

Here's a way to move one existing column that will modify the existing dataframe in place.

my_column = df.pop('column name')
df.insert(3, my_column.name, my_column)  # Is in-place

Upvotes: 21

Giorgos Myrianthous

Reputation: 39930

Another option would be to use set_index() method followed by a reset_index(). Note that we first pop() the column we intend to move to the front of the dataframe, so that we avoid name collision when resetting the index:

df.set_index(df.pop('column_name'), inplace=True)
df.reset_index(inplace=True)

For more details see How to change the order of dataframe columns in pandas.

Upvotes: 0

sanzo213

Reputation: 149

I tried making a order function which you can reorder/move column(s) with reference of order command of Stata. it would be better to make a py file( the name of which may be order.py) and save it in a directory and call it function

def order(dataframe,cols,f_or_l=None,before=None, after=None):

#만든이: 김완석, Stata로 뚝딱뚝딱 저자, blog.naver.com/sanzo213 운영
# 갖다 쓰시거나 수정을 하셔도 되지만 출처는 꼭 밝혀주세요
# cols옵션 및 befor/after옵션에 튜플이 가능하게끔 수정했으며, 오류문구 수정함(2021.07.12,1)
# 칼럼이 멀티인덱스인 상태에서 reset_index()메소드 사용했을 시 적용안되는 걸 수정함(2021.07.12,2) 

import pandas as pd
if (type(cols)==str) or (type(cols)==int) or (type(cols)==float) or (type(cols)==bool) or type(cols)==tuple:    
    cols=[cols]
    
dd=list(dataframe.columns)
for i in cols:
    i
    dd.remove(i) #cols요소를 제거함
    
if (f_or_l==None) & ((before==None) & (after==None)):
    print('f_or_l옵션을 쓰시거나 아니면 before옵션/after옵션 쓰셔야되요')
    
if ((f_or_l=='first') or (f_or_l=='last')) & ~((before==None) & (after==None)):
    print('f_or_l옵션 사용시 before after 옵션 사용불가입니다.')
    
if (f_or_l=='first') & (before==None) & (after==None):
    new_order=cols+dd
    dataframe=dataframe[new_order]
    return dataframe

if (f_or_l=='last') & (before==None) & (after==None):   
    new_order=dd+cols
    dataframe=dataframe[new_order]
    return dataframe
    
if (before!=None) & (after!=None):
    print('before옵션 after옵션 둘다 쓸 수 없습니다.')
    

if (before!=None) & (after==None) & (f_or_l==None):

    if not((type(before)==str) or (type(before)==int) or (type(before)==float) or
       (type(before)==bool) or ((type(before)!=list)) or 
       ((type(before)==tuple))):
        print('before옵션은 칼럼 하나만 입력가능하며 리스트 형태로도 입력하지 마세요.')
    
    else:
        b=dd[:dd.index(before)]
        a=dd[dd.index(before):]
        
        new_order=b+cols+a
        dataframe=dataframe[new_order]  
        return dataframe
    
if (after!=None) & (before==None) & (f_or_l==None):

    if not((type(after)==str) or (type(after)==int) or (type(after)==float) or
       (type(after)==bool) or ((type(after)!=list)) or 
       ((type(after)==tuple))):
            
        print('after옵션은 칼럼 하나만 입력가능하며 리스트 형태로도 입력하지 마세요.')  

    else:
        b=dd[:dd.index(after)+1]
        a=dd[dd.index(after)+1:]
        
        new_order=b+cols+a
        dataframe=dataframe[new_order]
        return dataframe

python code below is an example of order function I made. I hope you can reorder column(s) so easily with my order function :)

# module

import pandas as pd
import numpy as np
from order import order # call order function from order.py file

# make a dataset

columns='a b c d e f g h i j k'.split()
dic={}

n=-1
for i in columns:
    
    n+=1
    dic[i]=list(range(1+n,10+1+n))
data=pd.DataFrame(dic)
print(data)

# use order function (1) : order column e in the first

data2=order(data,'e',f_or_l='first')
print(data2)

# use order function (2): order column e in the last , "data" dataframe

print(order(data,'e',f_or_l='last'))


# use order function (3) : order column i before column c in "data" dataframe

print(order(data,'i',before='c'))


# use order function (4) : order column g after column b in "data" dataframe

print(order(data,'g',after='b'))

# use order function (4) : order columns ['c', 'd', 'e'] after column i in "data" dataframe

print(order(data,['c', 'd', 'e'],after='i'))

Upvotes: -1

S.Baum

Reputation: 117

I thought of the same as Dmitriy Work, clearly easiest answer:

df["mean"] = df.mean(1)
l =  list(np.arange(0,len(df.columns) -1 ))
l.insert(0,-1)
df.iloc[:,l]

Upvotes: 0

abc

Reputation: 177

Similar to the top answer, there is an alternative using deque() and its rotate() method. The rotate method takes the last element in the list and inserts it to the beginning:

from collections import deque

columns = deque(df.columns.tolist())
columns.rotate()

df = df[columns]

Upvotes: 1

Yuca

Reputation: 6101

If your column names are too-long-to-type then you could specify the new order through a list of integers with the positions:

Data:

          0         1         2         3         4      mean
0  0.397312  0.361846  0.719802  0.575223  0.449205  0.500678
1  0.287256  0.522337  0.992154  0.584221  0.042739  0.485741
2  0.884812  0.464172  0.149296  0.167698  0.793634  0.491923
3  0.656891  0.500179  0.046006  0.862769  0.651065  0.543382
4  0.673702  0.223489  0.438760  0.468954  0.308509  0.422683
5  0.764020  0.093050  0.100932  0.572475  0.416471  0.389390
6  0.259181  0.248186  0.626101  0.556980  0.559413  0.449972
7  0.400591  0.075461  0.096072  0.308755  0.157078  0.207592
8  0.639745  0.368987  0.340573  0.997547  0.011892  0.471749
9  0.050582  0.714160  0.168839  0.899230  0.359690  0.438500

Generic example:

new_order = [3,2,1,4,5,0]
print(df[df.columns[new_order]])  

          3         2         1         4      mean         0
0  0.575223  0.719802  0.361846  0.449205  0.500678  0.397312
1  0.584221  0.992154  0.522337  0.042739  0.485741  0.287256
2  0.167698  0.149296  0.464172  0.793634  0.491923  0.884812
3  0.862769  0.046006  0.500179  0.651065  0.543382  0.656891
4  0.468954  0.438760  0.223489  0.308509  0.422683  0.673702
5  0.572475  0.100932  0.093050  0.416471  0.389390  0.764020
6  0.556980  0.626101  0.248186  0.559413  0.449972  0.259181
7  0.308755  0.096072  0.075461  0.157078  0.207592  0.400591
8  0.997547  0.340573  0.368987  0.011892  0.471749  0.639745
9  0.899230  0.168839  0.714160  0.359690  0.438500  0.050582

Although it might seem like I'm just explicitly typing the column names in a different order, the fact that there's a column 'mean' should make it clear that new_order relates to actual positions and not column names.

For the specific case of OP's question:

new_order = [-1,0,1,2,3,4]
df = df[df.columns[new_order]]
print(df)

       mean         0         1         2         3         4
0  0.500678  0.397312  0.361846  0.719802  0.575223  0.449205
1  0.485741  0.287256  0.522337  0.992154  0.584221  0.042739
2  0.491923  0.884812  0.464172  0.149296  0.167698  0.793634
3  0.543382  0.656891  0.500179  0.046006  0.862769  0.651065
4  0.422683  0.673702  0.223489  0.438760  0.468954  0.308509
5  0.389390  0.764020  0.093050  0.100932  0.572475  0.416471
6  0.449972  0.259181  0.248186  0.626101  0.556980  0.559413
7  0.207592  0.400591  0.075461  0.096072  0.308755  0.157078
8  0.471749  0.639745  0.368987  0.340573  0.997547  0.011892
9  0.438500  0.050582  0.714160  0.168839  0.899230  0.359690

The main problem with this approach is that calling the same code multiple times will create different results each time, so one needs to be careful :)

Upvotes: 67

matthhias

Reputation: 221

I wanted to bring two columns in front from a dataframe where I do not know exactly the names of all columns, because they are generated from a pivot statement before. So, if you are in the same situation: To bring columns in front that you know the name of and then let them follow by "all the other columns", I came up with the following general solution:

df = df.reindex_axis(['Col1','Col2'] + list(df.columns.drop(['Col1','Col2'])), axis=1)

Upvotes: 8

ZEE

Reputation: 186

How about using T?

df = df.T.reindex(['mean', 0, 1, 2, 3, 4]).T

Upvotes: 2

CSQL

Reputation: 106

A pretty straightforward solution that worked for me is to use .reindex on df.columns:

df = df[df.columns.reindex(['mean', 0, 1, 2, 3, 4])[0]]

Upvotes: 4

dmvianna

Reputation: 15730

This question has been answered before but reindex_axis is deprecated now so I would suggest to use:

df = df.reindex(sorted(df.columns), axis=1)

For those who want to specify the order they want instead of just sorting them, here's the solution spelled out:

df = df.reindex(['the','order','you','want'], axis=1)

Now, how you want to sort the list of column names is really not a pandas question, that's a Python list manipulation question. There are many ways of doing that, and I think this answer has a very neat way of doing it.

Upvotes: 70

erncyp

Reputation: 1672

I think this is a slightly neater solution:

df.insert(0, 'mean', df.pop("mean"))

This solution is somewhat similar to @JoeHeffer 's solution but this is one liner.

Here we remove the column "mean" from the dataframe and attach it to index 0 with the same column name.

Upvotes: 29

liangli

Reputation: 1197

Suppose you have df with columns A B C.

The most simple way is:

df = df.reindex(['B','C','A'], axis=1)

Upvotes: 75

Kaustubh J

Reputation: 792

Hackiest method in the book

df.insert(0, "test", df["mean"])
df = df.drop(columns=["mean"]).rename(columns={"test": "mean"})

Upvotes: 4

Shoresh

Reputation: 2853

A simple approach is using set(), in particular when you have a long list of columns and do not want to handle them manually:

cols = list(set(df.columns.tolist()) - set(['mean']))
cols.insert(0, 'mean')
df = df[cols]

Upvotes: 3

Xopi García

Reputation: 386

To set an existing column right/left of another, based on their names:

def df_move_column(df, col_to_move, col_left_of_destiny="", right_of_col_bool=True):
    cols = list(df.columns.values)
    index_max = len(cols) - 1

    if not right_of_col_bool:
        # set left of a column "c", is like putting right of column previous to "c"
        # ... except if left of 1st column, then recursive call to set rest right to it
        aux = cols.index(col_left_of_destiny)
        if not aux:
            for g in [x for x in cols[::-1] if x != col_to_move]:
                df = df_move_column(
                        df, 
                        col_to_move=g, 
                        col_left_of_destiny=col_to_move
                        )
            return df
        col_left_of_destiny = cols[aux - 1]

    index_old = cols.index(col_to_move)
    index_new = 0
    if len(col_left_of_destiny):
        index_new = cols.index(col_left_of_destiny) + 1

    if index_old == index_new:
        return df

    if index_new < index_old:
        index_new = np.min([index_new, index_max])
        cols = (
            cols[:index_new]
            + [cols[index_old]]
            + cols[index_new:index_old]
            + cols[index_old + 1 :]
        )
    else:
        cols = (
            cols[:index_old]
            + cols[index_old + 1 : index_new]
            + [cols[index_old]]
            + cols[index_new:]
        )

    df = df[cols]
    return df

E.g.

cols = list("ABCD")
df2 = pd.DataFrame(np.arange(4)[np.newaxis, :], columns=cols)
for k in cols:
    print(30 * "-")
    for g in [x for x in cols if x != k]:
        df_new = df_move_column(df2, k, g)
        print(f"{k} after {g}:  {df_new.columns.values}")
for k in cols:
    print(30 * "-")
    for g in [x for x in cols if x != k]:
        df_new = df_move_column(df2, k, g, right_of_col_bool=False)
        print(f"{k} before {g}:  {df_new.columns.values}")

Output:

Upvotes: 0

Mathia Haure-Touzé

Reputation: 331

You can use a set which is an unordered collection of unique elements to do keep the "order of the other columns untouched":

other_columns = list(set(df.columns).difference(["mean"])) #[0, 1, 2, 3, 4]

Then, you can use a lambda to move a specific column to the front by:

In [1]: import numpy as np                                                                               

In [2]: import pandas as pd                                                                              

In [3]: df = pd.DataFrame(np.random.rand(10, 5))                                                         

In [4]: df["mean"] = df.mean(1)                                                                          

In [5]: move_col_to_front = lambda df, col: df[[col]+list(set(df.columns).difference([col]))]            

In [6]: move_col_to_front(df, "mean")                                                                    
Out[6]: 
       mean         0         1         2         3         4
0  0.697253  0.600377  0.464852  0.938360  0.945293  0.537384
1  0.609213  0.703387  0.096176  0.971407  0.955666  0.319429
2  0.561261  0.791842  0.302573  0.662365  0.728368  0.321158
3  0.518720  0.710443  0.504060  0.663423  0.208756  0.506916
4  0.616316  0.665932  0.794385  0.163000  0.664265  0.793995
5  0.519757  0.585462  0.653995  0.338893  0.714782  0.305654
6  0.532584  0.434472  0.283501  0.633156  0.317520  0.994271
7  0.640571  0.732680  0.187151  0.937983  0.921097  0.423945
8  0.562447  0.790987  0.200080  0.317812  0.641340  0.862018
9  0.563092  0.811533  0.662709  0.396048  0.596528  0.348642

In [7]: move_col_to_front(df, 2)                                                                         
Out[7]: 
          2         0         1         3         4      mean
0  0.938360  0.600377  0.464852  0.945293  0.537384  0.697253
1  0.971407  0.703387  0.096176  0.955666  0.319429  0.609213
2  0.662365  0.791842  0.302573  0.728368  0.321158  0.561261
3  0.663423  0.710443  0.504060  0.208756  0.506916  0.518720
4  0.163000  0.665932  0.794385  0.664265  0.793995  0.616316
5  0.338893  0.585462  0.653995  0.714782  0.305654  0.519757
6  0.633156  0.434472  0.283501  0.317520  0.994271  0.532584
7  0.937983  0.732680  0.187151  0.921097  0.423945  0.640571
8  0.317812  0.790987  0.200080  0.641340  0.862018  0.562447
9  0.396048  0.811533  0.662709  0.596528  0.348642  0.563092

Upvotes: 6

rra

Reputation: 809

Here is a very simple answer to this(only one line).

You can do that after you added the 'n' column into your df as follows.

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.rand(10, 5))
df['mean'] = df.mean(1)
df
           0           1           2           3           4        mean
0   0.929616    0.316376    0.183919    0.204560    0.567725    0.440439
1   0.595545    0.964515    0.653177    0.748907    0.653570    0.723143
2   0.747715    0.961307    0.008388    0.106444    0.298704    0.424512
3   0.656411    0.809813    0.872176    0.964648    0.723685    0.805347
4   0.642475    0.717454    0.467599    0.325585    0.439645    0.518551
5   0.729689    0.994015    0.676874    0.790823    0.170914    0.672463
6   0.026849    0.800370    0.903723    0.024676    0.491747    0.449473
7   0.526255    0.596366    0.051958    0.895090    0.728266    0.559587
8   0.818350    0.500223    0.810189    0.095969    0.218950    0.488736
9   0.258719    0.468106    0.459373    0.709510    0.178053    0.414752


### here you can add below line and it should work 
# Don't forget the two (()) 'brackets' around columns names.Otherwise, it'll give you an error.

df = df[list(('mean',0, 1, 2,3,4))]
df

        mean           0           1           2           3           4
0   0.440439    0.929616    0.316376    0.183919    0.204560    0.567725
1   0.723143    0.595545    0.964515    0.653177    0.748907    0.653570
2   0.424512    0.747715    0.961307    0.008388    0.106444    0.298704
3   0.805347    0.656411    0.809813    0.872176    0.964648    0.723685
4   0.518551    0.642475    0.717454    0.467599    0.325585    0.439645
5   0.672463    0.729689    0.994015    0.676874    0.790823    0.170914
6   0.449473    0.026849    0.800370    0.903723    0.024676    0.491747
7   0.559587    0.526255    0.596366    0.051958    0.895090    0.728266
8   0.488736    0.818350    0.500223    0.810189    0.095969    0.218950
9   0.414752    0.258719    0.468106    0.459373    0.709510    0.178053

Upvotes: 6

plhn

Reputation: 5273

Just flipping helps often.

df[df.columns[::-1]]

Or just shuffle for a look.

import random
cols = list(df.columns)
random.shuffle(cols)
df[cols]

Upvotes: 5

hayj

Reputation: 1263

I think this function is more straightforward. You Just need to specify a subset of columns at the start or the end or both:

def reorder_df_columns(df, start=None, end=None):
    """
        This function reorder columns of a DataFrame.
        It takes columns given in the list `start` and move them to the left.
        Its also takes columns in `end` and move them to the right.
    """
    if start is None:
        start = []
    if end is None:
        end = []
    assert isinstance(start, list) and isinstance(end, list)
    cols = list(df.columns)
    for c in start:
        if c not in cols:
            start.remove(c)
    for c in end:
        if c not in cols or c in start:
            end.remove(c)
    for c in start + end:
        cols.remove(c)
    cols = start + cols + end
    return df[cols]

Upvotes: 1

user_stack_overflow

Reputation: 485

I have a very specific use case for re-ordering column names in pandas. Sometimes I am creating a new column in a dataframe that is based on an existing column. By default pandas will insert my new column at the end, but I want the new column to be inserted next to the existing column it's derived from.

def rearrange_list(input_list, input_item_to_move, input_item_insert_here):
    '''
    Helper function to re-arrange the order of items in a list.
    Useful for moving column in pandas dataframe.

    Inputs:
        input_list - list
        input_item_to_move - item in list to move
        input_item_insert_here - item in list, insert before 

    returns:
        output_list
    '''
    # make copy for output, make sure it's a list
    output_list = list(input_list)

    # index of item to move
    idx_move = output_list.index(input_item_to_move)

    # pop off the item to move
    itm_move = output_list.pop(idx_move)

    # index of item to insert here
    idx_insert = output_list.index(input_item_insert_here)

    # insert item to move into here
    output_list.insert(idx_insert, itm_move)

    return output_list


import pandas as pd

# step 1: create sample dataframe
df = pd.DataFrame({
    'motorcycle': ['motorcycle1', 'motorcycle2', 'motorcycle3'],
    'initial_odometer': [101, 500, 322],
    'final_odometer': [201, 515, 463],
    'other_col_1': ['blah', 'blah', 'blah'],
    'other_col_2': ['blah', 'blah', 'blah']
})
print('Step 1: create sample dataframe')
display(df)
print()

# step 2: add new column that is difference between final and initial
df['change_odometer'] = df['final_odometer']-df['initial_odometer']
print('Step 2: add new column')
display(df)
print()

# step 3: rearrange columns
ls_cols = df.columns
ls_cols = rearrange_list(ls_cols, 'change_odometer', 'final_odometer')
df=df[ls_cols]
print('Step 3: rearrange columns')
display(df)

Upvotes: 1

Alexander

Reputation: 109696

You need to create a new list of your columns in the desired order, then use df = df[cols] to rearrange the columns in this new order.

cols = ['mean']  + [col for col in df if col != 'mean']
df = df[cols]

You can also use a more general approach. In this example, the last column (indicated by -1) is inserted as the first column.

cols = [df.columns[-1]] + [col for col in df if col != df.columns[-1]]
df = df[cols]

You can also use this approach for reordering columns in a desired order if they are present in the DataFrame.

inserted_cols = ['a', 'b', 'c']
cols = ([col for col in inserted_cols if col in df] 
        + [col for col in df if col not in inserted_cols])
df = df[cols]

Upvotes: 83

Pygirl

Reputation: 13349

import numpy as np
import pandas as pd
df = pd.DataFrame()
column_names = ['x','y','z','mean']
for col in column_names: 
    df[col] = np.random.randint(0,100, size=10000)

You can try out the following solutions :

Solution 1:

df = df[ ['mean'] + [ col for col in df.columns if col != 'mean' ] ]

Solution 2:

df = df[['mean', 'x', 'y', 'z']]

Solution 3:

col = df.pop("mean")
df = df.insert(0, col.name, col)

Solution 4:

df.set_index(df.columns[-1], inplace=True)
df.reset_index(inplace=True)

Solution 5:

cols = list(df)
cols = [cols[-1]] + cols[:-1]
df = df[cols]

solution 6:

order = [1,2,3,0] # setting column's order
df = df[[df.columns[i] for i in order]]

Time Comparison:

Solution 1:

CPU times: user 1.05 ms, sys: 35 µs, total: 1.08 ms Wall time: 995 µs

Solution 2:

CPU times: user 933 µs, sys: 0 ns, total: 933 µs Wall time: 800 µs

Solution 3:

CPU times: user 0 ns, sys: 1.35 ms, total: 1.35 ms Wall time: 1.08 ms

Solution 4:

CPU times: user 1.23 ms, sys: 45 µs, total: 1.27 ms Wall time: 986 µs

Solution 5:

CPU times: user 1.09 ms, sys: 19 µs, total: 1.11 ms Wall time: 949 µs

Solution 6:

CPU times: user 955 µs, sys: 34 µs, total: 989 µs Wall time: 859 µs

Upvotes: 131

Kweweli

Reputation: 335

Most of the answers did not generalize enough and pandas reindex_axis method is a little tedious, hence I offer a simple function to move an arbitrary number of columns to any position using a dictionary where key = column name and value = position to move to. If your dataframe is large pass True to 'big_data' then the function will return the ordered columns list. And you could use this list to slice your data.

def order_column(df, columns, big_data = False):

    """Re-Orders dataFrame column(s)
       Parameters : 
       df      -- dataframe
       columns -- a dictionary:
                  key   = current column position/index or column name
                  value = position to move it to  
       big_data -- boolean 
                  True = returns only the ordered columns as a list
                          the user user can then slice the data using this
                          ordered column
                  False = default - return a copy of the dataframe
    """
    ordered_col = df.columns.tolist()

    for key, value in columns.items():

        ordered_col.remove(key)
        ordered_col.insert(value, key)

    if big_data:

        return ordered_col

    return df[ordered_col]

# e.g.
df = pd.DataFrame({'chicken wings': np.random.rand(10, 1).flatten(), 'taco': np.random.rand(10,1).flatten(),
                          'coffee': np.random.rand(10, 1).flatten()})
df['mean'] = df.mean(1)

df = order_column(df, {'mean': 0, 'coffee':1 })

>>>

col = order_column(df, {'mean': 0, 'coffee':1 }, True)

col
>>>
['mean', 'coffee', 'chicken wings', 'taco']

# you could grab it by doing this

df = df[col]

Upvotes: 2

Alvaro Silvino

Reputation: 9753

In your case,

df = df.reindex(columns=['mean',0,1,2,3,4])

will do exactly what you want.

In my case (general form):

df = df.reindex(columns=sorted(df.columns))
df = df.reindex(columns=(['opened'] + list([a for a in df.columns if a != 'opened']) ))

Upvotes: 212

clocker

Reputation: 1366

I ran into a similar question myself, and just wanted to add what I settled on. I liked the reindex_axis() method for changing column order. This worked:

df = df.reindex_axis(['mean'] + list(df.columns[:-1]), axis=1)

An alternate method based on the comment from @Jorge:

df = df.reindex(columns=['mean'] + list(df.columns[:-1]))

Although reindex_axis seems to be slightly faster in micro benchmarks than reindex, I think I prefer the latter for its directness.

Upvotes: 23

How to change the order of DataFrame columns?

Answers (30)

Sort columns (inside pandas) using sort_index()

Sort columns outside of pandas and select columns

Here is a very simple answer to this(only one line).

Time Comparison:

Related Questions

Sort columns (inside pandas) using `sort_index()`