Reputation: 653
I have a pandas dataframe represented as follows:-
data=pd.read_csv("training-set-org.csv",sep=',', header = None)
The output when I do a print looks like this:-
print(data.head())
0 1 2 3 4 5 6 7 \
0 22.896448 33.1366 18.738063 26.846212 6 4242 50257 131962
1 22.896448 33.1366 18.738063 26.846212 6 4242 50257 68719
2 22.896448 33.1366 18.738063 26.846212 6 4242 50257 171647
3 22.896448 33.1366 18.738063 26.846212 6 4242 50257 246620
4 22.896448 33.1366 18.738063 26.846212 6 4242 50257 64072
Now I drop the column 4
data.drop(data.columns[4],axis=1,inplace=True)
From what I understand, data.columns[4] refers to the column labeled as 4, which is rightly so.
Now,when I print the dataframe I get :-
printing data: 0 1 2 3 5 6 7
0 22.896448 33.1366 18.738063 26.846212 4242 50257 131962
1 22.896448 33.1366 18.738063 26.846212 4242 50257 68719
2 22.896448 33.1366 18.738063 26.846212 4242 50257 171647
3 22.896448 33.1366 18.738063 26.846212 4242 50257 246620
4 22.896448 33.1366 18.738063 26.846212 4242 50257 64072
As you can see that the label 4 is missing.
How do I re-label the dataframe so that it every column label moves to the left so that columns are labeled as 0,1,2,3,4..6 and not upto 7.
I want to use the dataframe data with the reduced number of columns and work on the columns using data.iloc[:,i] in a loop.
How do I do this?. I am still at an infancy stage in python. so any help is appreciated..
Upvotes: 1
Views: 458
Reputation: 863226
You can assign default columns created by RangeIndex
:
data.columns = pd.RangeIndex(len(data.columns))
print (data)
0 1 2 3 4 5 6
0 22.896448 33.1366 18.738063 26.846212 4242 50257 131962
1 22.896448 33.1366 18.738063 26.846212 4242 50257 68719
2 22.896448 33.1366 18.738063 26.846212 4242 50257 171647
3 22.896448 33.1366 18.738063 26.846212 4242 50257 246620
4 22.896448 33.1366 18.738063 26.846212 4242 50257 64072
Or use range
:
data.columns = range(len(data.columns))
print (data)
0 1 2 3 4 5 6
0 22.896448 33.1366 18.738063 26.846212 4242 50257 131962
1 22.896448 33.1366 18.738063 26.846212 4242 50257 68719
2 22.896448 33.1366 18.738063 26.846212 4242 50257 171647
3 22.896448 33.1366 18.738063 26.846212 4242 50257 246620
4 22.896448 33.1366 18.738063 26.846212 4242 50257 64072
Timings: For interesting only :)
In [126]: %timeit data.columns = range(len(data.columns))
The slowest run took 4.70 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 23.4 µs per loop
In [127]: %timeit data.columns = pd.RangeIndex(len(data.columns))
The slowest run took 4.61 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 14.4 µs per loop
In [128]: %timeit data.columns = np.arange(len(data.columns))
The slowest run took 8.52 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 45.2 µs per loop
Upvotes: 1
Reputation: 4054
If your columns labels are just integers, you can use below code :
import numpy as np
data.columns = np.arange(len(data.columns))
Upvotes: 0