Reputation: 1735
I have a big Excel table (18k rows and 400 cols) which has some rows highlighted using different colors. Is there a way to filter the rows by colors using openxlsx
?
I first loaded the workbook
wb <- loadWorkbook(file = "Items Comparison.xlsx")
getStyles(wb)
df <- read.xlsx(wb, sheet = 1)
I see the styles used in the workbook using getStyles(wb)
, but not sure how to use that information to filter all cells for each column by colors.
[[1]]
A custom cell style.
Cell formatting: GENERAL
Font name: Tahoma
Font size: 9
Font colour: #FFFFFF
Font decoration: BOLD
Cell borders: Top: thin, Bottom: thin, Left: thin, Right: thin
Cell border colours: #4E648A, #4E648A, #4E648A, #4E648A
Cell vert. align: top
Cell fill foreground: rgb: #384C70
Cell fill background: rgb: #384C70
wraptext: TRUE
[[2]]
A custom cell style.
Cell formatting: GENERAL
Font name: Tahoma
Font size: 9
Font colour: #FFFFFF
Font decoration: BOLD
Cell borders: Top: thin, Bottom: thin, Left: thin, Right: thin
Cell border colours: #4E648A, #4E648A, #4E648A, #4E648A
Cell vert. align: top
Cell fill foreground: rgb: #384C70
Cell fill background: rgb: #384C70
wraptext: TRUE
What can I do to filter data by fill colors?
UPDATE
Based on @Henrik solution, I tried to use his code but I kept getting error. So, to understand what was going on, I printed the output of x$style$fill$fillFg
rgb
"FF384C70"
rgb
"FF384C70"
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
rgb
"FF384C70"
NULL
NULL
NULL
rgb
"FFFFFF00"
rgb
"FFFFFF00"
theme
"0"
theme
"0"
rgb
"FFFFFF00"
NULL
theme
"2"
theme tint
"4" "0.79998168889431442"
theme
"8"
theme
"8"
rgb
"FFFFC000"
rgb
"FFFFC000"
theme tint
"5" "0.39997558519241921"
theme tint
"5" "0.39997558519241921"
theme tint
"9" "0.39997558519241921"
theme tint
"5" "0.79998168889431442"
rgb
"FFFFFF00"
rgb
"FF384C70"
NULL
NULL
NULL
rgb
"FF384C70"
rgb
"FF384C70"
[[1]]
rgb
"FF384C70"
[[2]]
rgb
"FF384C70"
[[3]]
NULL
[[4]]
NULL
[[5]]
NULL
[[6]]
NULL
[[7]]
NULL
[[8]]
NULL
[[9]]
NULL
[[10]]
NULL
[[11]]
NULL
[[12]]
NULL
[[13]]
rgb
"FF384C70"
[[14]]
NULL
[[15]]
NULL
[[16]]
NULL
[[17]]
rgb
"FFFFFF00"
[[18]]
rgb
"FFFFFF00"
[[19]]
theme
"0"
[[20]]
theme
"0"
[[21]]
rgb
"FFFFFF00"
[[22]]
NULL
[[23]]
theme
"2"
[[24]]
theme tint
"4" "0.79998168889431442"
[[25]]
theme
"8"
[[26]]
theme
"8"
[[27]]
rgb
"FFFFC000"
[[28]]
rgb
"FFFFC000"
[[29]]
theme tint
"5" "0.39997558519241921"
[[30]]
theme tint
"5" "0.39997558519241921"
[[31]]
theme tint
"9" "0.39997558519241921"
[[32]]
theme tint
"5" "0.79998168889431442"
[[33]]
rgb
"FFFFFF00"
[[34]]
rgb
"FF384C70"
[[35]]
NULL
[[36]]
NULL
[[37]]
NULL
[[38]]
rgb
"FF384C70"
[[39]]
rgb
"FF384C70"
I'm still confused why there're only 39 items. total number of rows is variable but not 39. I'm also not understanding the operation - is it rowwise or columnwise?
Upvotes: 6
Views: 2822
Reputation: 2425
A solution using openxlsx package Example below looks for color "FFC000" and looks in columns 1 and 6 The approach first identifies which defined styles have the font color of interest, then looks through the style objects to see which cells those styles have been applied to, returning the index of the rows that match the color and the pre-defined column search. The result would give all rows where at least one cell within the column search has the specified color.
excelwb <- openxlsx::loadWorkbook(excel_file)
strikestyles <- getStyles(excelwb)
goldcolors <- which(sapply(strikestyles,'[[','fontColour')=="FFFFC000")
goldcols <- c(1,6) #these are the columns that have the gold color of interest -- could also be 1:ncol
goldrows <- lapply(excelwb$styleObjects[goldcolors],
function(x) {
value_cols <- which(x$cols %in% goldcols)
if (length(value_cols)==0) return(NULL)
else return (x$rows[value_cols])
})
goldrows <- as.numeric(unlist(goldrows))
Upvotes: 0
Reputation: 67778
In your workbook object, you find the styleObjects
element. There you can dig your way to the fill color (style$fill$fillFg
) and row
elements. Loop over the style objects (lapply
), check if the color is desired one (e.g. red, "FFFF0000"; x$style$fill$fillFg == "FFFF0000"
, and grab the row index (x$rows[1]
).
wb <- loadWorkbook(file = "foo.xlsx")
unlist(lapply(wb$styleObjects, function(x){
x$rows[1][x$style$fill$fillFg == "FFFF0000"]}))
# [1] 3
If the colored cells are non-contiguous, you may want to grab both rows and columns:
l = lapply(wb$styleObjects, function(x){
if(x$style$fill$fillFg == "FFFF0000"){
data.frame(ri = x$rows, ci = x$cols, col = "FFFF0000")}})
l[lengths(l) > 0]
# [[1]]
# ri ci col
# 1 1 2 FFFF0000
# 2 2 3 FFFF0000
# 3 3 1 FFFF0000
Upvotes: 2
Reputation: 27732
library(tidyxl)
formats <- xlsx_formats( "./temp/test_file.xlsx" )
cells <- xlsx_cells( "./temp/test_file.xlsx" )
#what colors are used?
formats$local$fill$patternFill$fgColor$rgb
# [1] NA "FFC00000" "FF00B0F0" NA
#find rows fo cells with red background
cells[ cells$local_format_id %in%
which( formats$local$fill$patternFill$fgColor$rgb == "FFC00000"),
"row" ]
# [1] 1
Upvotes: 8