Timothy Lombard
Timothy Lombard

Reputation: 967

Python dict from excel report

I have an excel file where sheet1 looks like this:enter image description here

Company names are in bold [APPLE, EPSON, ROLAND] project names are below the Company names.

here is CSV from the pictured:

Report Date,10/10/18,,,
Page 1 of 1,,,,
,,,,
Project Name,,Job Number,Start Date,Due Date
,,,,
APPLE,,,,
macbook,,12345,1/1/19,2/1/19
iphone,,23456,1/2/19,2/2/19
,,,,
EPSON,,,,
ET-2000 printer,,34567,1/3/19,2/4/19
,,,,
ROLAND,,,,
RD-700,,45678,1/4/19,2/4/19

The worksheet is in memory using openpyxl. My desired output is to have a python dictionary with company as the key. Below is what I have tried but the output dict has all projects in each company rather that just the projects for each company.

from openpyxl import load_workbook
from collections import namedtuple
Record = namedtuple('Record', 'project_name job_number start_date due_date ')
from pprint import pprint
wb = load_workbook('SOquestion.xlsx')
ws = wb.active

def make_co_list(ws):
    co_list = []
    for _ in ws.rows:
        if _[0].value and _[2].value == None:
            co_list.append(_[0].value)

    co_list.remove('Report Date')
    co_list.remove('Page 1 of 1')

    return co_list

co_list = make_co_list(ws)
co_dict = {c:[] for c in co_list}

for k,v in co_dict.items():
    for row in ws.rows:
        if row[0].value == k:
            co = k
            for row in ws.rows:
                if co  and row[2].value and row[0].value not in ["Report Date", "Page 1 of 1", "Project Name", co_list] :
                    record = Record(row[0].value,
                                    row[2].value,
                                    row[3].value,
                                    row[4].value
                                    )
                    print("record", record)
                    co_dict[co].append(record) 

Upvotes: 0

Views: 64

Answers (1)

user3757614
user3757614

Reputation: 1806

That double loop over ws.rows doesn't look good. I would go with a state-based approach: (I have not tested this, but the principle should work.)

current_company = None
co_dict = collections.defaultdict(list)
for row in ws.rows:
    if row[0].value and row[2].value is None:  # new company section
        current_company = row[0].value
        continue
    if current_company is None or row[0].value is None:  # empty row
        continue
    record = Record(row[0].value,
                    row[2].value,
                    row[3].value,
                    row[4].value
                    )
    print("record", record)
    co_dict[current_company].append(record)

Upvotes: 2

Related Questions