csaladenes
csaladenes

Reputation: 1209

Get Excel cell background color in pandas read_excel?

I have an .xls Excel file with cells having background colors. I am reading that file into pandas with read_excel. Is there any way to get the background colors of cells?

Upvotes: 23

Views: 29686

Answers (4)

CAPSLOCK
CAPSLOCK

Reputation: 6483

Improving on Sumit's answer (which should be the accepted one in my opinion), you can obtain the color for the whole column by using list comprehension:

import openpyxl
from openpyxl import load_workbook
excel_file = 'my_file.xlsx' 
wb = load_workbook(excel_file, data_only = True)
sh = wb['my_sheet']
# extract color from column A.
color_in_hex = [cell.fill.start_color.index for cell in sh['A:A']]

Upvotes: 6

Sumit Pokhrel
Sumit Pokhrel

Reputation: 2208

The Solution suggested above works only for xls file, not for xlsx file. This raises a NotImplementedError: formatting_info=True not yet implemented. Xlrd library is still not updated to work for xlsx files. So you have to Save As and change the format every time which may not work for you.
Here is a solution for xlsx files using openpyxl library. A2 is the cell whose color code we need to find out.

import openpyxl
from openpyxl import load_workbook
excel_file = 'color_codes.xlsx' 
wb = load_workbook(excel_file, data_only = True)
sh = wb['Sheet1']
color_in_hex = sh['A2'].fill.start_color.index # this gives you Hexadecimal value of the color
print ('HEX =',color_in_hex) 
print('RGB =', tuple(int(color_in_hex[i:i+2], 16) for i in (0, 2, 4))) # Color in RGB

Upvotes: 15

BossaNova
BossaNova

Reputation: 1527

I edited the code snippet from @csaladenes's response above based on this link, and it works for my xls file (the original resulted in all cells showing the same color index, though they have different background colors):

import xlrd
import numpy as np
wb = xlrd.open_workbook(file, formatting_info=True)
sheet = wb.sheet_by_name("mysheet")
bgcol=np.zeros([sheet.nrows,sheet.ncols])
for row in range(sheet.nrows):
    for col in range(sheet.ncols):
        c = sheet.cell(row, col)
        cif = sheet.cell_xf_index(row, col)
        iif = wb.xf_list[cif]
        cbg = iif.background.pattern_colour_index
        bgcol[row,col] = cbg

Upvotes: 1

csaladenes
csaladenes

Reputation: 1209

Brute-forced it through xlrd, as per Mark's suggestion:

from xlrd import open_workbook
wb = open_workbook('wb.xls', formatting_info=True)
sheet = wb.sheet_by_name("mysheet")
#create empy colormask matrix
bgcol=np.zeros([sheet.nrows,sheet.ncols])
#cycle through all cells to get colors
for row in range(sheet.nrows):
  for column in range(sheet.ncols):
    cell = sheet.cell(row, column)  
    fmt = wb.xf_list[cell.xf_index]
    bgcol[row,column]=fmt.background.background_colour_index
#return pandas mask of colors
colormask=pd.DataFrame(bgcol) 

Yet, there must be a better way thorugh pandas directly...

Upvotes: 13

Related Questions