Kaoutar
Kaoutar

Reputation: 365

How to make a pieplot with overlap in R?

i have a dataset with three columns, small example:

    A      B    
 1  sety   NA
 2  NA     bety
 3  NA     bety
 4  sety   bety
 5  sety   NA

how can i plot a pieplot where i have percentage of people having sety, percentage of people having bety, and percentage of people having both sety and bety, in the example above , the percents are respectively 60%?, 60%, 20%, The third percent must show as overlap between the first two.

may be something like this :

enter image description here

Upvotes: 1

Views: 198

Answers (2)

zephryl
zephryl

Reputation: 17204

  1. Arrange and summarize your data to find “start” and “stop” positions for each set, and convert to percentages. You’ll use these positions as ymin and ymax for geom_rect().
  2. Take the median of ymin and ymax as the position for your % labels.
  3. Create a separate dataframe with the value and position for the overlap label.
  4. Dodge the rects and labels for each group using position_dodge(), and add transparency using alpha to emphasize the overlap.
  5. Circularize using coord_polar().
library(dplyr)
library(tidyr)
library(ggplot2)
library(scales)

plot_data <- dat %>% 
  arrange(A, !is.na(B)) %>% 
  summarize(across(
    A:B, 
    list(
      ymin = ~ (first(which(!is.na(.x))) - 1) / n(), 
      ymax = ~ last(which(!is.na(.x))) / n(),
      pct = ~ sum(!is.na(.x)) / n()
    )
  )) %>% 
  pivot_longer(everything(), names_to = c("set", ".value"), names_sep = "_") %>%
  mutate(
    xmin = .1, 
    xmax = 1,
    x = xmin + ((xmax - xmin) / 2),
    y = ymin + ((ymax - ymin) / 2)
  )

overlap <- with(
  plot_data, 
  tibble(x = median(x), y = median(c(max(ymin), min(ymax))), pct = sum(pct) - 1)
)

ggplot(plot_data, aes(x, y)) +
  geom_rect(
    aes(xmin = .1, xmax = 1, ymin = ymin, ymax = ymax, fill = set),
    color = "black",
    alpha = .5,
    position = position_dodge(width = .2)
  ) +
  geom_text(
    aes(label = percent(pct), group = set), 
    position = position_dodge(width = .2),
    size = 16 / .pt,
    fontface = "bold"
  ) +
  geom_text(
    data = overlap, 
    aes(label = percent(pct)),
    size = 16 / .pt,
    fontface = "bold"
  ) +
  scale_fill_manual(values = c("#E41A1C", "#377EB8")) +
  scale_x_continuous(limits = 0:1) +
  coord_polar(theta = "y") +
  theme_void()

Upvotes: 1

stomper
stomper

Reputation: 1385


library(ggplot2)
library(stringr)
library(dplyr)

#initial dataset
data <- data.frame(A = c("sety", NA, NA, "sety", "sety"), B = c(NA, "bety", "bety", "bety", NA))

#combine values and remove the NA
data$C <- str_remove_all(paste0(data$A, data$B), "NA")

#get frequency of each value
freq <- data %>%
    group_by(C) %>%
    select(C) %>%
    summarize(count = n())

#plot
ggplot(freq, aes(x="", y=count, fill = C))+
    geom_bar(stat="identity", width = 1) +
    coord_polar("y", start = 0)

enter image description here

Upvotes: 0

Related Questions