DanG
DanG

Reputation: 741

Visualize presence/absence hourly data with ggplot

I have a health signal cell data that the server must receive every hour, after data manipulation and pivoting I would like to create a presence-absence data chart from it to be able to find spotty connections per id/date for troubleshooting later.

Here is my sample data after pivot_wider, if the server got a signal in each hour there is 1 and if no signal is received there is NA.

df <- tibble::tribble(
  ~id,        ~date,     ~n,   ~s, ~hour_1, ~hour_2, ~hour_3, ~hour_4, ~hour_5, ~hour_6, ~hour_7, ~hour_8, ~hour_9, ~hour_10, ~hour_11, ~hour_12, ~hour_13, ~hour_14, ~hour_15, ~hour_16, ~hour_17, ~hour_18, ~hour_19, ~hour_20, ~hour_21, ~hour_22, ~hour_23, ~hour_24,
   1L, "2022-07-26", "CELL", TRUE,      1L,      1L,      1L,      1L,      1L,      1L,      1L,      1L,      1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,
   2L, "2022-07-26", "CELL", TRUE,      1L,      NA,      NA,      NA,      NA,      NA,      NA,      NA,      1L,       1L,       1L,       1L,       1L,       NA,       NA,       1L,       1L,       1L,       1L,       NA,       NA,       1L,       1L,       NA,
   3L, "2022-07-26", "CELL", TRUE,      1L,      1L,      1L,      1L,      1L,      1L,      1L,      1L,      1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,
   4L, "2022-07-26", "CELL", TRUE,      1L,      1L,      1L,      1L,      1L,      1L,      1L,      1L,      1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,
   5L, "2022-07-26", "CELL", TRUE,      1L,      1L,      1L,      1L,      1L,      1L,      1L,      1L,      1L,       1L,       NA,       NA,       1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,       1L,
   6L, "2022-07-26", "CELL", TRUE,      1L,      1L,      1L,      1L,      1L,      1L,      1L,      1L,      1L,       1L,       1L,       1L,       1L,       1L,       NA,       NA,       NA,       NA,       NA,       1L,       1L,       1L,       1L,       1L
  )

The chart I have in my mind is something like this or similar to this, assume we filtered for specific date, id as a row and presence-absence for every hour with color/no color.

enter image description here

Upvotes: 1

Views: 289

Answers (2)

TarJae
TarJae

Reputation: 79204

Update:

Here is a version with removed text:

The significant pointer were Conditional formatting of multiple columns in gt table and How can I color the same value in the same color in the entire gt table in R? and change font color conditionally in multiple columns using gt()

library(dplyr)
library(tidyr)
library(gt)

text_color_1 <- function(x, Limit){cells_body(columns = !!sym(x), rows = !!sym(x) == 1)}
text_color_0 <- function(x, Limit){cells_body(columns = !!sym(x), rows = !!sym(x) == 0)}
names<- colnames(df[-c(1:4)])

df %>% 
  mutate(across(starts_with("hour"), ~replace_na(., 0))) %>% 
  select(-date, -n, -s) %>% 
  gt() %>% 
  data_color(
    columns = starts_with("hour"),
    colors = scales::col_numeric(
      palette = c("white", "green"),
      domain = c(0,1)
    )) %>% 
  tab_style(
    style = list(
      cell_borders(
        sides = c("top", "bottom"),
        color = "#C0C0C0",
        weight = px(2)
      ),
      cell_borders(
        sides = c("left", "right"),
        color = "#C0C0C0",
        weight = px(2)
      )
    ),
    locations = list(
      cells_body(
        columns = starts_with("hour")
      )
    )) %>% 
  tab_style(style = list(cell_text(color = "green"), cell_text(weight = "bold")),
            locations = lapply(names, text_color_1, Limit = sym(Limit))) %>% 
  tab_style(style = list(cell_text(color = "white"), cell_text(weight = "bold")),
            locations = lapply(names, text_color_0, Limit = sym(Limit))) 

enter image description here

First try:

This solution is for the whole dataset: In case you could filter:

The trick is to use data_color function from gt package and Setting the domain of scales::col_numeric(). See here Section examples https://gt.rstudio.com/reference/data_color.html

library(dplyr)
library(tidyr)
library(gt)

df %>% 
  mutate(across(starts_with("hour"), ~replace_na(., 0))) %>% 
  gt() %>% 
  data_color(
    columns = starts_with("hour"),
    colors = scales::col_numeric(
      palette = c("white", "green"),
      domain = c(0,1)
    ))

enter image description here

Upvotes: 2

Allan Cameron
Allan Cameron

Reputation: 174476

In ggplot you could do:

library(tidyverse)

df %>%
  select(-(2:4)) %>%
  pivot_longer(-1, names_to = "hour", values_to = "on_off") %>%
  mutate(hour = factor(as.numeric(sub("hour_", "", hour))),
         on_off = factor(on_off),
         id = factor(id)) %>%
  ggplot(aes(hour, id, fill = on_off)) +
  geom_tile(color = "gray30") +
  coord_equal() +
  scale_fill_manual(values = "#7AE063", na.value = "white", guide = "none") +
  theme_minimal(base_size = 16)

enter image description here

Upvotes: 2

Related Questions