How to differentiate between two rows of different colours in excel when reading it into pandas dataframe?

Question

I have an excel file in which the first row is the column names and from the 2nd row to say 10th row, the rows are coloured orange, from 11th row to 25th row, they are coloured yellow and from 26th row to 50th row they are coloured green.

Now, rows corresponding to orange have to be in one dataframe, rows corresponding to yellow have to be in the second dataframe, rows corresponding to green have to be in the third dataframe.

Now, I have many excel files of this type and the boundary rows(where colour changes) differs in each sheet and does not follow any pattern and I do not know boundary row numbers.

How can I separate these rows by code, (NOT by manually going and seeing the boundary and then separating about it).

I tried to search it on google but the results are about formatting a dataframe and saving it, what I want is the opposite.

jonboy · Accepted Answer

You can do this using the xlrd package.

from xlrd import open_workbook

book = open_workbook("file.xls", formatting_info=True)
sheets = book.sheet_names()
for index, sh in enumerate(sheets):
    sheet = book.sheet_by_index(index)
    rows, cols = sheet.nrows, sheet.ncols
    for row in range(rows):
        for col in range(cols):
            thecell = sheet.cell(row, col)      
            xfx = sheet.cell_xf_index(row, col)
            xf = book.xf_list[xfx]
            bgx = xf.background.pattern_colour_index
            print(bgx)

How to differentiate between two rows of different colours in excel when reading it into pandas dataframe?

Answers (1)

Related Questions