Reputation: 23
I have an excel file in which the first row is the column names and from the 2nd row to say 10th row, the rows are coloured orange, from 11th row to 25th row, they are coloured yellow and from 26th row to 50th row they are coloured green.
Now, rows corresponding to orange have to be in one dataframe, rows corresponding to yellow have to be in the second dataframe, rows corresponding to green have to be in the third dataframe.
Now, I have many excel files of this type and the boundary rows(where colour changes) differs in each sheet and does not follow any pattern and I do not know boundary row numbers.
How can I separate these rows by code, (NOT by manually going and seeing the boundary and then separating about it).
I tried to search it on google but the results are about formatting a dataframe and saving it, what I want is the opposite.
Upvotes: 2
Views: 166
Reputation: 368
You can do this using the xlrd
package.
from xlrd import open_workbook
book = open_workbook("file.xls", formatting_info=True)
sheets = book.sheet_names()
for index, sh in enumerate(sheets):
sheet = book.sheet_by_index(index)
rows, cols = sheet.nrows, sheet.ncols
for row in range(rows):
for col in range(cols):
thecell = sheet.cell(row, col)
xfx = sheet.cell_xf_index(row, col)
xf = book.xf_list[xfx]
bgx = xf.background.pattern_colour_index
print(bgx)
Upvotes: 2