Reputation: 495
this is my df
df = pd.DataFrame(np.random.randint(1,10,(5,6)),columns=['a','b','c','d','e','f'])
I have to perform quite a few operations and finally I might end up with a df with column sequence changed or one or more of the columns missing
i.e.
case1 - the final df column sequence is ['e','b','d','f']
or case2 - the final df column sequence is ['d','a','f','e']
or case3 - the final df column sequence is ['d','f','e','b']
how do I make sure regardless of how many columns I am left with, the final sequence of columns is [...,'d','e','f']
i.e. if am left with ['e','b','d','f'] the final df columns sequence is ['b','d','e','f']
for ['d','a','f','e'] the rearrange df columns sequence should be ['a','d','e','f']
for['c','a','b','e','f','d'] the rearrange df columns sequence should be ['a','b','c','d','e','f']
i.e. ['other columns', 'd','e','f']
my original df has more than 80 odd columns so I need to be able to do it dynamically.
Upvotes: 0
Views: 160
Reputation: 862611
If always all values from L
are in Dataframe columns use Index.difference
with join list L
and select by subset:
L = ['d','e','f']
df = df[df.columns.difference(L).tolist() + L]
#if order is important
#df = df[df.columns.difference(L, sort=False).tolist() + L]
print (df)
a b c d e f
0 8 1 4 6 2 5
1 8 7 7 8 5 9
2 8 6 2 6 5 5
3 4 9 9 5 1 5
4 2 4 2 1 1 9
If some value missing from L
add Index.intersection
:
df = pd.DataFrame(np.random.randint(1,10,(5,6)),columns=['a','b','c','d','e','g'])
L = ['d','e','f']
df = df[df.columns.difference(L).tolist() + df.columns.intersection(L).tolist()]
print (df)
a b c g d e
0 2 3 2 7 4 4
1 9 3 4 6 9 7
2 9 6 1 9 7 7
3 4 6 1 8 2 8
4 4 2 4 2 6 8
Or if need all columns from list use DataFrame.reindex
:
df = df.reindex(df.columns.difference(L).tolist() + L, axis=1)
print (df)
a b c g d e f
0 1 7 7 8 1 2 NaN
1 9 8 6 3 2 7 NaN
2 8 1 7 6 8 2 NaN
3 3 3 2 5 2 2 NaN
4 8 4 4 1 4 1 NaN
Upvotes: 2