Ciaran
Ciaran

Reputation: 478

Parsing Excel data with pandas - why is it skipping columns when renaming columns?

I really hope its something simple im missing. I'm reading in excel workbooks using python pandas. When I rename my columns to be numbers 1:len(columns) it skips the first few columns.

It seems to only skip them if the cells don't have values in them. Even if the column doesn't have a value, i still want it to number them 1,2,3...

xl = pd.ExcelFile('Excel.xlsm')
df = xl.parse('Worksheet1')
df.columns = [str(x) for x in range(0,df.shape[1])]

I hope someone could point me in the right direction. I have tried using a header to skip the first few columns, but nothing gave me a consistent and reliable result. Thanks!

This is the output I get with skipped columns.

Upvotes: 0

Views: 829

Answers (2)

ShaneCalder
ShaneCalder

Reputation: 346

Here is a tutorial for python and excel:
https://github.com/python-excel/tutorial/raw/master/python-excel.pdf
Pages 10 and 11 have how to row and columns slice. There are more options as you go through the document.
GitHub page
https://github.com/python-excel/tutorial

Another option is to delete
https://docs.aspose.com/display/cellsjava/Inserting+and+Deleting+Rows+and+Columns+in+Python
I hope this helps.

Upvotes: 0

jezrael
jezrael

Reputation: 862511

#dataframe have default columns names
df = pd.DataFrame({0:list('abcdef'),
                   1:[4,5,4,5,5,4],
                   2:[7,8,9,4,2,3]})

print (df)
   0  1  2
0  a  4  7
1  b  5  8
2  c  4  9
3  d  5  4
4  e  5  2
5  f  4  3

#first column called index
print (df.index)
RangeIndex(start=0, stop=6, step=1)

#check columns names (RangeIndex can be also)
print (df.columns)
Int64Index([0, 1, 2], dtype='int64')

#add 1 to columns anmes and convert to str
df.columns = (df.columns + 1).astype(str)
print (df)
   1  2  3
0  a  4  7
1  b  5  8
2  c  4  9
3  d  5  4
4  e  5  2
5  f  4  3

print (df.columns)
Index(['1', '2', '3'], dtype='object')

Another solution for change columns names is rename:

df = df.rename(columns = lambda x: str(x + 1))
print (df.columns)
Index(['1', '2', '3'], dtype='object')

If want create 1,2,3..N range string columns:

df = pd.DataFrame({'a':list('abcdef'),
                   'f':[4,5,4,5,5,4],
                   'm':[7,8,9,4,2,3]})

print (df)
   a  f  m
0  a  4  7
1  b  5  8
2  c  4  9
3  d  5  4
4  e  5  2
5  f  4  3

df.columns = pd.RangeIndex(1, df.shape[1] + 1).astype(str)

print (df.columns)
Index(['1', '2', '3'], dtype='object')

print (df)
   1  2  3
0  a  4  7
1  b  5  8
2  c  4  9
3  d  5  4
4  e  5  2
5  f  4  3

Upvotes: 1

Related Questions