Stella
Stella

Reputation: 1564

How to read Excel with merge cells and save the Excel to a dict in python

I've a table as picture shown which with merged cells. How can I read the Excel table in python and saved in a dict?

table_dict={S1:[a,b,c,d],
            S2:[[a1,a2,a3],[b1,b2,b3],[d1,d2,d3]],
            S3:[[a4,a5,a6][b4,b5,b6][c4,c5,c6][d4,d5,d6]]}

tabel with merge cells

Upvotes: 1

Views: 1536

Answers (1)

Peter Gibson
Peter Gibson

Reputation: 19554

Not sure about excel documents, but you can read files in CSV format with the csv module. From the docs:

>>> import csv
>>> with open('eggs.csv', 'rb') as csvfile:
...     spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
...     for row in spamreader:
...         print ', '.join(row)
Spam, Spam, Spam, Spam, Spam, Baked Beans
Spam, Lovely Spam, Wonderful Spam

Testing with your file as outputted from LibreOffice (I don't have Excel on this machine), merged cells are split and padded with blank cells (as though they weren't merged in the first place). So you'd have something like:

[['S1', 'S2', '', '', 'S3', '', ''],
 ['a', 'a1', 'a2', 'a3', 'a4', 'a5', 'a6'],
 ['b', 'b1', 'b2', 'b3', 'b4', 'b5', 'b6'],
 ... etc]

Then you just need a script to convert this to the format you desired.

import csv
from collections import defaultdict

with open('file.csv', 'rb') as csvfile:
    reader = csv.reader(csvfile)
    # rotate the data so we have a list of columns, not a list of rows
    # note this is not very robust
    data = zip(*reader)
    results = defaultdict(list)
    last = None
    for col in data:
        # pull the column name off the front
        name = col[0]
        cells = col[1:]
        # use the previous column name if blank
        if name is '':
            name = last
        # check for missing column name at start
        if name is None:
            print 'invalid data:', col
            continue
        results[name].append(cells)
        last = name
    print results

Yields:

defaultdict(<type 'list'>, {
    'S3': [('a4', 'b4', 'c4', 'd4'), ('a5', 'b5', 'c5', 'd5'), ('a6', 'b6', 'c6', 'd6')], 
    'S2': [('a1', 'b1', 'c1', 'd1'), ('a2', 'b2', 'c2', 'd2'), ('a3', 'b3', 'c3', 'd3')], 
    'S1': [('a', 'b', 'c', 'd')]})

Upvotes: 1

Related Questions