user1828605
user1828605

Reputation: 1735

Filter data highlighted in Excel by cell fill color using openxlsx

I have a big Excel table (18k rows and 400 cols) which has some rows highlighted using different colors. Is there a way to filter the rows by colors using openxlsx?

I first loaded the workbook

wb <- loadWorkbook(file = "Items Comparison.xlsx")
getStyles(wb)
df <- read.xlsx(wb, sheet = 1)

I see the styles used in the workbook using getStyles(wb), but not sure how to use that information to filter all cells for each column by colors.

[[1]]
A custom cell style. 

 Cell formatting: GENERAL 
 Font name: Tahoma 
 Font size: 9 
 Font colour: #FFFFFF 
 Font decoration: BOLD 
 Cell borders: Top: thin, Bottom: thin, Left: thin, Right: thin 
 Cell border colours: #4E648A, #4E648A, #4E648A, #4E648A 
 Cell vert. align: top 
 Cell fill foreground:  rgb: #384C70 
 Cell fill background:  rgb: #384C70 
 wraptext: TRUE 


[[2]]
A custom cell style. 

 Cell formatting: GENERAL 
 Font name: Tahoma 
 Font size: 9 
 Font colour: #FFFFFF 
 Font decoration: BOLD 
 Cell borders: Top: thin, Bottom: thin, Left: thin, Right: thin 
 Cell border colours: #4E648A, #4E648A, #4E648A, #4E648A 
 Cell vert. align: top 
 Cell fill foreground:  rgb: #384C70 
 Cell fill background:  rgb: #384C70 
 wraptext: TRUE 

What can I do to filter data by fill colors?

UPDATE

Based on @Henrik solution, I tried to use his code but I kept getting error. So, to understand what was going on, I printed the output of x$style$fill$fillFg

       rgb 
"FF384C70" 
       rgb 
"FF384C70" 
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
       rgb 
"FF384C70" 
NULL
NULL
NULL
       rgb 
"FFFFFF00" 
       rgb 
"FFFFFF00" 
 theme 
   "0" 
 theme 
   "0" 
       rgb 
"FFFFFF00" 
NULL
 theme 
   "2" 
                theme                  tint 
                  "4" "0.79998168889431442" 
 theme 
   "8" 
 theme 
   "8" 
       rgb 
"FFFFC000" 
       rgb 
"FFFFC000" 
                theme                  tint 
                  "5" "0.39997558519241921" 
                theme                  tint 
                  "5" "0.39997558519241921" 
                theme                  tint 
                  "9" "0.39997558519241921" 
                theme                  tint 
                  "5" "0.79998168889431442" 
       rgb 
"FFFFFF00" 
       rgb 
"FF384C70" 
NULL
NULL
NULL
       rgb 
"FF384C70" 
       rgb 
"FF384C70" 
[[1]]
       rgb 
"FF384C70" 

[[2]]
       rgb 
"FF384C70" 

[[3]]
NULL

[[4]]
NULL

[[5]]
NULL

[[6]]
NULL

[[7]]
NULL

[[8]]
NULL

[[9]]
NULL

[[10]]
NULL

[[11]]
NULL

[[12]]
NULL

[[13]]
       rgb 
"FF384C70" 

[[14]]
NULL

[[15]]
NULL

[[16]]
NULL

[[17]]
       rgb 
"FFFFFF00" 

[[18]]
       rgb 
"FFFFFF00" 

[[19]]
 theme 
   "0" 

[[20]]
 theme 
   "0" 

[[21]]
       rgb 
"FFFFFF00" 

[[22]]
NULL

[[23]]
 theme 
   "2" 

[[24]]
                theme                  tint 
                  "4" "0.79998168889431442" 

[[25]]
 theme 
   "8" 

[[26]]
 theme 
   "8" 

[[27]]
       rgb 
"FFFFC000" 

[[28]]
       rgb 
"FFFFC000" 

[[29]]
                theme                  tint 
                  "5" "0.39997558519241921" 

[[30]]
                theme                  tint 
                  "5" "0.39997558519241921" 

[[31]]
                theme                  tint 
                  "9" "0.39997558519241921" 

[[32]]
                theme                  tint 
                  "5" "0.79998168889431442" 

[[33]]
       rgb 
"FFFFFF00" 

[[34]]
       rgb 
"FF384C70" 

[[35]]
NULL

[[36]]
NULL

[[37]]
NULL

[[38]]
       rgb 
"FF384C70" 

[[39]]
       rgb 
"FF384C70" 

I'm still confused why there're only 39 items. total number of rows is variable but not 39. I'm also not understanding the operation - is it rowwise or columnwise?

Upvotes: 6

Views: 2822

Answers (3)

Soren
Soren

Reputation: 2425

A solution using openxlsx package Example below looks for color "FFC000" and looks in columns 1 and 6 The approach first identifies which defined styles have the font color of interest, then looks through the style objects to see which cells those styles have been applied to, returning the index of the rows that match the color and the pre-defined column search. The result would give all rows where at least one cell within the column search has the specified color.

excelwb <- openxlsx::loadWorkbook(excel_file)
strikestyles <- getStyles(excelwb)
goldcolors <- which(sapply(strikestyles,'[[','fontColour')=="FFFFC000") 
goldcols <- c(1,6) #these are the columns that have the gold color of interest -- could also be 1:ncol
goldrows <- lapply(excelwb$styleObjects[goldcolors],
                     function(x) {
                       value_cols <- which(x$cols %in% goldcols) 
                       if (length(value_cols)==0) return(NULL)
                       else return (x$rows[value_cols])
                     })
goldrows <- as.numeric(unlist(goldrows))

Upvotes: 0

Henrik
Henrik

Reputation: 67778

In your workbook object, you find the styleObjects element. There you can dig your way to the fill color (style$fill$fillFg) and row elements. Loop over the style objects (lapply), check if the color is desired one (e.g. red, "FFFF0000"; x$style$fill$fillFg == "FFFF0000", and grab the row index (x$rows[1]).

enter image description here

wb <- loadWorkbook(file = "foo.xlsx")
unlist(lapply(wb$styleObjects, function(x){
  x$rows[1][x$style$fill$fillFg == "FFFF0000"]}))

# [1] 3

If the colored cells are non-contiguous, you may want to grab both rows and columns:

enter image description here

l = lapply(wb$styleObjects, function(x){
  if(x$style$fill$fillFg == "FFFF0000"){
    data.frame(ri = x$rows, ci = x$cols, col = "FFFF0000")}})
l[lengths(l) > 0]

# [[1]]
#   ri ci      col
# 1  1  2 FFFF0000
# 2  2  3 FFFF0000
# 3  3  1 FFFF0000

Upvotes: 2

Wimpel
Wimpel

Reputation: 27732

enter image description here

library(tidyxl)

formats <- xlsx_formats( "./temp/test_file.xlsx" )
cells <- xlsx_cells( "./temp/test_file.xlsx" )

#what colors are used?
formats$local$fill$patternFill$fgColor$rgb
# [1] NA         "FFC00000" "FF00B0F0" NA  

#find rows fo cells  with red background
cells[ cells$local_format_id %in%
         which( formats$local$fill$patternFill$fgColor$rgb == "FFC00000"), 
       "row" ]

# [1] 1

Upvotes: 8

Related Questions