Mark K
Mark K

Reputation: 9338

Python to combine Excel spreadsheets

Hello all…a question in using Panda to combine Excel spreadsheets.

The problem is that, sequence of columns are lost when they are combined. If there are more files to combine, the format will be even worse.

If gives an error message, if the number of files are big.

ValueError: column index (256) not an int in range(256) 

What I am using is below:

import pandas as pd

df = pd.DataFrame()

for f in ['c:\\1635.xls', 'c:\\1644.xls']:
    data = pd.read_excel(f, 'Sheet1')
    data.index = [os.path.basename(f)] * len(data)
    df = df.append(data)

df.to_excel('c:\\CB.xls')

The original files and combined look like: enter image description here

what's the best way to combine great amount of such similar Excel files?

thanks.

Upvotes: 0

Views: 1968

Answers (2)

DeepSpace
DeepSpace

Reputation: 81594

Instead of data.index = [os.path.basename(f)] * len(data) you should use df.reset_index().

For example:

1.xlsx:

a   b
1   1
2   2
3   3

2.xlsx:

a   b
4   4
5   5
6   6

code:

 df = pd.DataFrame()

 for f in [r"C:\Users\Adi\Desktop\1.xlsx", r"C:\Users\Adi\Desktop\2.xlsx"]:
    data = pd.read_excel(f, 'Sheet1')
    df = df.append(data)

 df.reset_index(inplace=True, drop=True)
 df.to_excel('c:\\CB.xls')

cb.xls:

        a   b
    0   1   1
    1   2   2
    2   3   3
    3   4   4
    4   5   5
    5   6   6

If you don't want the dataframe's index to be in the output file, you can use df.to_excel('c:\\CB.xls', index=False).

Upvotes: 1

Sinux
Sinux

Reputation: 1808

I usually use xlrd and xlwt:

#!/usr/bin/env python
# encoding: utf-8

import xlwt
import xlrd
import os


current_file = xlwt.Workbook()
write_table = current_file.add_sheet('sheet1', cell_overwrite_ok=True)

key_list = [u'City', u'Country', u'Received Date', u'Shipping Date', u'Weight', u'1635']
for title_index, text in enumerate(key_list):
    write_table.write(0, title_index, text)


file_list = ['1635.xlsx', '1644.xlsx']

i = 1
for name in file_list:
    data = xlrd.open_workbook(name)

    table = data.sheets()[0]
    nrows = table.nrows
    for row in range(nrows):
        if row == 0:
            continue
        for index, context in enumerate(table.row_values(row)):
            write_table.write(i, index, context)
        i += 1


current_file.save(os.getcwd() + '/result.xls')

Upvotes: 1

Related Questions