Reputation: 9338
Hello all…a question in using Panda to combine Excel spreadsheets.
The problem is that, sequence of columns are lost when they are combined. If there are more files to combine, the format will be even worse.
If gives an error message, if the number of files are big.
ValueError: column index (256) not an int in range(256)
What I am using is below:
import pandas as pd
df = pd.DataFrame()
for f in ['c:\\1635.xls', 'c:\\1644.xls']:
data = pd.read_excel(f, 'Sheet1')
data.index = [os.path.basename(f)] * len(data)
df = df.append(data)
df.to_excel('c:\\CB.xls')
The original files and combined look like:
what's the best way to combine great amount of such similar Excel files?
thanks.
Upvotes: 0
Views: 1968
Reputation: 81594
Instead of data.index = [os.path.basename(f)] * len(data)
you should use df.reset_index()
.
For example:
1.xlsx:
a b
1 1
2 2
3 3
2.xlsx:
a b
4 4
5 5
6 6
code:
df = pd.DataFrame()
for f in [r"C:\Users\Adi\Desktop\1.xlsx", r"C:\Users\Adi\Desktop\2.xlsx"]:
data = pd.read_excel(f, 'Sheet1')
df = df.append(data)
df.reset_index(inplace=True, drop=True)
df.to_excel('c:\\CB.xls')
cb.xls:
a b
0 1 1
1 2 2
2 3 3
3 4 4
4 5 5
5 6 6
If you don't want the dataframe's index to be in the output file, you can use df.to_excel('c:\\CB.xls', index=False)
.
Upvotes: 1
Reputation: 1808
I usually use xlrd
and xlwt
:
#!/usr/bin/env python
# encoding: utf-8
import xlwt
import xlrd
import os
current_file = xlwt.Workbook()
write_table = current_file.add_sheet('sheet1', cell_overwrite_ok=True)
key_list = [u'City', u'Country', u'Received Date', u'Shipping Date', u'Weight', u'1635']
for title_index, text in enumerate(key_list):
write_table.write(0, title_index, text)
file_list = ['1635.xlsx', '1644.xlsx']
i = 1
for name in file_list:
data = xlrd.open_workbook(name)
table = data.sheets()[0]
nrows = table.nrows
for row in range(nrows):
if row == 0:
continue
for index, context in enumerate(table.row_values(row)):
write_table.write(i, index, context)
i += 1
current_file.save(os.getcwd() + '/result.xls')
Upvotes: 1