Reputation: 1209
I have an .xls
Excel file with cells having background colors. I am reading that file into pandas with read_excel
. Is there any way to get the background colors of cells?
Upvotes: 23
Views: 29686
Reputation: 6483
Improving on Sumit's answer (which should be the accepted one in my opinion), you can obtain the color for the whole column by using list comprehension:
import openpyxl
from openpyxl import load_workbook
excel_file = 'my_file.xlsx'
wb = load_workbook(excel_file, data_only = True)
sh = wb['my_sheet']
# extract color from column A.
color_in_hex = [cell.fill.start_color.index for cell in sh['A:A']]
Upvotes: 6
Reputation: 2208
The Solution suggested above works only for xls
file, not for xlsx
file. This raises a NotImplementedError: formatting_info=True not yet implemented
. Xlrd
library is still not updated to work for xlsx
files. So you have to Save As
and change the format every time which may not work for you.
Here is a solution for xlsx
files using openpyxl
library. A2
is the cell whose color code we need to find out.
import openpyxl
from openpyxl import load_workbook
excel_file = 'color_codes.xlsx'
wb = load_workbook(excel_file, data_only = True)
sh = wb['Sheet1']
color_in_hex = sh['A2'].fill.start_color.index # this gives you Hexadecimal value of the color
print ('HEX =',color_in_hex)
print('RGB =', tuple(int(color_in_hex[i:i+2], 16) for i in (0, 2, 4))) # Color in RGB
Upvotes: 15
Reputation: 1527
I edited the code snippet from @csaladenes's response above based on this link, and it works for my xls file (the original resulted in all cells showing the same color index, though they have different background colors):
import xlrd
import numpy as np
wb = xlrd.open_workbook(file, formatting_info=True)
sheet = wb.sheet_by_name("mysheet")
bgcol=np.zeros([sheet.nrows,sheet.ncols])
for row in range(sheet.nrows):
for col in range(sheet.ncols):
c = sheet.cell(row, col)
cif = sheet.cell_xf_index(row, col)
iif = wb.xf_list[cif]
cbg = iif.background.pattern_colour_index
bgcol[row,col] = cbg
Upvotes: 1
Reputation: 1209
Brute-forced it through xlrd
, as per Mark's suggestion:
from xlrd import open_workbook
wb = open_workbook('wb.xls', formatting_info=True)
sheet = wb.sheet_by_name("mysheet")
#create empy colormask matrix
bgcol=np.zeros([sheet.nrows,sheet.ncols])
#cycle through all cells to get colors
for row in range(sheet.nrows):
for column in range(sheet.ncols):
cell = sheet.cell(row, column)
fmt = wb.xf_list[cell.xf_index]
bgcol[row,column]=fmt.background.background_colour_index
#return pandas mask of colors
colormask=pd.DataFrame(bgcol)
Yet, there must be a better way thorugh pandas directly...
Upvotes: 13