Parsing through a .xlsx sheet using xrld python package

Question

I have an xlsx sheet with sample data as shown below.

fileinfo: sample file | DateCreated: 09/23/2013 | Country: Japan

Num. | Name | Age | Sex | Country

  1  | Mark | 45  | M   | Australia
  2  | Steve| 50  | M   | United kingdom
  3  | Julia| 35  | F   | USA

fileinfo: sample file | DateCreated: 09/23/2013 | Country: Japan

Num. | Name | Age | Sex | Country

  1  | Ronald | 64  | M   | USA
  2  | Micheal| 52  | M   | China
  3  | Zed    | 35  | F   | USA

The file contains the fileinfo row anywhere in the file and how many ever times in the file. I need to make sure that i don't capture them in my code.

Is there a way in xlrd that i can capture just the headers and the data into a dictionary leaving alone the text data which is file info rows here in this example.

Here's the code that i have till now.

import xlrd

def importXLSX(fileName):
    with xlrd.open_workbook(fileName) as wb:
        worksheet = wb.sheet_by_index(0)

    total_rows = worksheet.nrows

    num_rows, curr_row = worksheet.nrows, 0

    keys = [x.value for x in worksheet.row(1)]

    data = dict((x, []) for x in keys)

    while curr_row < num_rows:
        curr_row += 1
        for i, val in enumerate(worksheet.row(curr_row+1)):
            data[keys[i]].append(val)
    return data


data = importXLSX('simple.xlsx')
print data

golldy · Accepted Answer

I finally found a way to do it. XLRD returns a zero when a cell is empty. So i was able to compare the rows with empty columns and figure out which are header rows and add them to a list and finally iterate and print them.

Parsing through a .xlsx sheet using xrld python package

Answers (1)

Related Questions