Reputation: 1564
I've a table as picture shown which with merged cells. How can I read the Excel table in python and saved in a dict?
table_dict={S1:[a,b,c,d],
S2:[[a1,a2,a3],[b1,b2,b3],[d1,d2,d3]],
S3:[[a4,a5,a6][b4,b5,b6][c4,c5,c6][d4,d5,d6]]}
Upvotes: 1
Views: 1536
Reputation: 19554
Not sure about excel documents, but you can read files in CSV format with the csv module. From the docs:
>>> import csv
>>> with open('eggs.csv', 'rb') as csvfile:
... spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
... for row in spamreader:
... print ', '.join(row)
Spam, Spam, Spam, Spam, Spam, Baked Beans
Spam, Lovely Spam, Wonderful Spam
Testing with your file as outputted from LibreOffice (I don't have Excel on this machine), merged cells are split and padded with blank cells (as though they weren't merged in the first place). So you'd have something like:
[['S1', 'S2', '', '', 'S3', '', ''],
['a', 'a1', 'a2', 'a3', 'a4', 'a5', 'a6'],
['b', 'b1', 'b2', 'b3', 'b4', 'b5', 'b6'],
... etc]
Then you just need a script to convert this to the format you desired.
import csv
from collections import defaultdict
with open('file.csv', 'rb') as csvfile:
reader = csv.reader(csvfile)
# rotate the data so we have a list of columns, not a list of rows
# note this is not very robust
data = zip(*reader)
results = defaultdict(list)
last = None
for col in data:
# pull the column name off the front
name = col[0]
cells = col[1:]
# use the previous column name if blank
if name is '':
name = last
# check for missing column name at start
if name is None:
print 'invalid data:', col
continue
results[name].append(cells)
last = name
print results
Yields:
defaultdict(<type 'list'>, {
'S3': [('a4', 'b4', 'c4', 'd4'), ('a5', 'b5', 'c5', 'd5'), ('a6', 'b6', 'c6', 'd6')],
'S2': [('a1', 'b1', 'c1', 'd1'), ('a2', 'b2', 'c2', 'd2'), ('a3', 'b3', 'c3', 'd3')],
'S1': [('a', 'b', 'c', 'd')]})
Upvotes: 1