Bens
Bens

Reputation: 203

Using R to read out excel-colorinfo

Is there any way to read out the color-index of cells from excel files with R?

While I can set the cell color with packages like XLConnect or XLSX, I have found no way to extract the color-information from existing workbooks.

Upvotes: 20

Views: 10487

Answers (2)

M--
M--

Reputation: 29238

R-Bloggers provided a function that will do the job for you. I am including the answer here for future reference.

Read the Excel file using xlsx package:

library(xlsx)
wb     <- loadWorkbook("test.xlsx")
sheet1 <- getSheets(wb)[[1]]

# get all rows
rows  <- getRows(sheet1)
cells <- getCells(rows)

This part extracts the information that later will be used for getting background color (or other style information) of the cells:

styles <- sapply(cells, getCellStyle) #This will get the styles

This is the function that identifies/extracts the cell background color:

cellColor <- function(style) 
   {
    fg  <- style$getFillForegroundXSSFColor()
    rgb <- tryCatch(fg$getRgb(), error = function(e) NULL)
    rgb <- paste(rgb, collapse = "")
    return(rgb)
   }

error will handle the cells with no background color.

Using sapply you can get the background color for all of the cells:

sapply(styles, cellColor)

You can also categorize/identify them by knowing the RGb codes:

mycolor <- list(green = "00ff00", red = "ff0000")
m     <- match(sapply(styles, cellColor), mycolor)
labs  <-names(mycolor)[m]

You can read more and learn how to apply it at R-bloggers and get the RGB codes from RapidTables.com

Upvotes: 24

Douglas Mesquita
Douglas Mesquita

Reputation: 1021

Old question but maybe it can help someone in the future.

There is a strange behavior in the POI (java) library (at least on my computer). It is not getting the colors correctly. The code provided in the @M--'s answer works well when the color is a basic color (indexed color), but does not work when the color is, for example, in grayscale. To get around you can use the following code using the getTint () function. Tint is a number between -1 (dark) and 1 (light), and combining it with the RGB (getRgb ()) function, you can completely recover the color.

cell_color <- function(style){
  fg  <- style$getFillForegroundXSSFColor()

  hex <- tryCatch(fg$getRgb(), error = function(e) NULL)
  hex <- paste0("#", paste(hex, collapse = ""))
  tint <- tryCatch(fg$getTint(), error = function(e) NULL)

  if(!is.null(tint) & !is.null(hex)){   # Tint varies between -1 (dark) and 1 (light)
    rgb_col <- col2rgb(col = hex)

    if(tint < 0) rgb_col <- (1-abs(tint))*rgb_col
    if(tint > 0) rgb_col <- rgb_col + (255-rgb_col)*tint

    hex <- rgb(red = rgb_col[1, 1], 
               green = rgb_col[2, 1], 
               blue = rgb_col[3, 1], 
               maxColorValue = 255)
  }

  return(hex)
}

Some references to help:

https://poi.apache.org/apidocs/dev/org/apache/poi/hssf/usermodel/HSSFExtendedColor.html#getTint--

https://bz.apache.org/bugzilla/show_bug.cgi?id=50787

Getting Excel fill colors using Apache POI

Upvotes: 3

Related Questions