sunny
sunny

Reputation: 653

How to relabel columns after I drop a column in pandas dataframe?

I have a pandas dataframe represented as follows:-

  data=pd.read_csv("training-set-org.csv",sep=',', header = None)

The output when I do a print looks like this:-

 print(data.head())

           0        1          2          3  4     5      6       7  \
           0  22.896448  33.1366  18.738063  26.846212  6  4242  50257  131962   
           1  22.896448  33.1366  18.738063  26.846212  6  4242  50257  68719   
           2  22.896448  33.1366  18.738063  26.846212  6  4242  50257  171647   
           3  22.896448  33.1366  18.738063  26.846212  6  4242  50257  246620   
           4  22.896448  33.1366  18.738063  26.846212  6  4242  50257   64072   

Now I drop the column 4

  data.drop(data.columns[4],axis=1,inplace=True)

From what I understand, data.columns[4] refers to the column labeled as 4, which is rightly so.

Now,when I print the dataframe I get :-

  printing data:            0        1          2          3     5      6       7           
        0  22.896448  33.1366  18.738063  26.846212  4242  50257  131962  
        1  22.896448  33.1366  18.738063  26.846212  4242  50257   68719  
        2  22.896448  33.1366  18.738063  26.846212  4242  50257  171647  
        3  22.896448  33.1366  18.738063  26.846212  4242  50257  246620  
        4  22.896448  33.1366  18.738063  26.846212  4242  50257   64072  

As you can see that the label 4 is missing.

How do I re-label the dataframe so that it every column label moves to the left so that columns are labeled as 0,1,2,3,4..6 and not upto 7.
I want to use the dataframe data with the reduced number of columns and work on the columns using data.iloc[:,i] in a loop. How do I do this?. I am still at an infancy stage in python. so any help is appreciated..

Upvotes: 1

Views: 458

Answers (3)

jezrael
jezrael

Reputation: 863226

You can assign default columns created by RangeIndex:

data.columns = pd.RangeIndex(len(data.columns))    
print (data)
           0        1          2          3     4      5       6
0  22.896448  33.1366  18.738063  26.846212  4242  50257  131962
1  22.896448  33.1366  18.738063  26.846212  4242  50257   68719
2  22.896448  33.1366  18.738063  26.846212  4242  50257  171647
3  22.896448  33.1366  18.738063  26.846212  4242  50257  246620
4  22.896448  33.1366  18.738063  26.846212  4242  50257   64072

Or use range:

data.columns = range(len(data.columns))    
print (data)
           0        1          2          3     4      5       6
0  22.896448  33.1366  18.738063  26.846212  4242  50257  131962
1  22.896448  33.1366  18.738063  26.846212  4242  50257   68719
2  22.896448  33.1366  18.738063  26.846212  4242  50257  171647
3  22.896448  33.1366  18.738063  26.846212  4242  50257  246620
4  22.896448  33.1366  18.738063  26.846212  4242  50257   64072

Timings: For interesting only :)

In [126]: %timeit data.columns = range(len(data.columns))
The slowest run took 4.70 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 23.4 µs per loop

In [127]: %timeit data.columns = pd.RangeIndex(len(data.columns))
The slowest run took 4.61 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 14.4 µs per loop

In [128]: %timeit data.columns = np.arange(len(data.columns))
The slowest run took 8.52 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 45.2 µs per loop

Upvotes: 1

Darius
Darius

Reputation: 12102

It's easy, try that:

data.columns = range(7)

Upvotes: 0

Spandan Brahmbhatt
Spandan Brahmbhatt

Reputation: 4054

If your columns labels are just integers, you can use below code :

import numpy as np
data.columns = np.arange(len(data.columns))

Upvotes: 0

Related Questions