Phil of Fins
Phil of Fins

Reputation: 33

How to swap the geom_points of a ggplot lollipop plot for lil pie charts to show proportion of data points assessed for the plot

I've made a plot I am mostly happy with, using Tidyverse in R, but the plot needs to show a bit more information and I haven't managed to work out how to do that yet.

The point of the plot is to show how a bunch of cells from three different animals were algorithmically sorted and bunched together according to their biology. Each animal had a lot of different celltypes, and there were a lot outputted clusters of cells; I am plotting one outputted cluster, and after looking at all the cells from each animal that were sorted into this cluster, I chose to show the top 5 celltype names from the source animals that made it into the plot. The plot shows this nicely (to me, at least), but it doesn't show whether ALL the cells of a given source celltype were bundled into this new cluster, or if half, or almost none, etc..

Here is the code I used, and the plot I got (and mostly like!).

library(tidyverse)
# create the contents of the toy dataset, then add together
species_organ <- c(rep("frog", 5),
                   rep("bat", 5),
                   rep("bird", 5)
)
annotation <- c("celltype1", "celltype2", "celltype3", "celltype4", "celltype5",
                "celltypeA", "celltypeB", "celltypeC", "celltypeD", "celltypeE",
                "celltypeAlpha", "celltypeBeta", "celltypeGamma", "celltypeDelta", "celltypeEpsilon"
)
count_in_integratedcluster <- c(253, 245, 226, 187, 185, 42, 18, 17, 11, 9, 58, 16, 8, 8, 7)
annotation_count_in_source_dataset <- c(413, 312, 349, 410, 233, 195, 198, 56, 166, 238, 82, 68, 270, 226, 81)
fraction_of_total_celltype_abundance <- count_in_integratedcluster / annotation_count_in_source_dataset

fake_dataframe <- data.frame(species_organ, annotation, count_in_integratedcluster, annotation_count_in_source_dataset, fraction_of_total_celltype_abundance)

# a few other things to decorate the plot with
how_many_cells_in_this_integrated_cluster <- 5056
cluster_name = "cluster6"

# now we make a lollipop plot
plot_lollipop_faceted.top5 <- ggplot(fake_dataframe) +
  geom_segment( aes(x=annotation, xend=annotation, y=0, yend=count_in_integratedcluster), color="grey") +
  geom_point( aes(x=annotation, y=count_in_integratedcluster, color=species_organ), size=3 ) +
  coord_flip()+
  theme(
    legend.position = "none",
    panel.border = element_blank(),
    panel.spacing = unit(0.1, "lines"),
    strip.text.x = element_text(size = 8)
  ) +
  xlab("") +
  ylab("How many times cells of this original annotation (y-axis)\nshowed up in this integrated cluster (plot title)") +
  facet_wrap(~species_organ, ncol=1, scale="free_y") +
  labs(title = paste(paste("integrated", cluster_name, sep = " "), ",", how_many_cells_in_this_integrated_cluster, "total cells"), 
       subtitle = "In this integrated cluster, see what cells contribute per species")

(plot I mostly like but which needs improvement)

An "easy" graphical fix would be to replace the geom_point with a cute little pie chart, with colour filling to report whether 90% of "bird muscle cells" or just 10% of "bird muscle cells" were ultimately apportioned to this cluster by the algorithm.

Here is a pencil sketch of how the graph could look like, if I made the swap I am looking for.

pencil sketch of improved plot

Any solution has to be in R, and I would appreciate Tidyverse-based approaches but I'm willing to try other approaches that convey the desired set of information.

I've looked at other related questions and unfortunately couldn't manage to make the suggested methods work for me, or else the suggested solution doesn't seem to be useful in my scenario; so far, I've examined:

R::ggplot2::geom_points: how to swap points with pie charts? (scatterpie docs didn't help me make sense of what to do to implement suggestions) ggplot use small pie charts as points with geom_point (the pie is nice, but I don't want to lose the other information currently conveyed by my plot already) Plotting pie charts in ggplot2 (title sounds right, but content was not helpful) create floating pie charts with ggplot (this is the second time I saw coord_polar() but I did not figure out how to use it after fiddling a bit with it/reading its docs)

Upvotes: 3

Views: 116

Answers (2)

Allan Cameron
Allan Cameron

Reputation: 174526

We can use scatterpie to get the plot you want, but it's a bit of a pain to use. It doesn't seem to like categorical variables, so these need to be converted to numeric via factor and relabelled in scales. It also won't play nicely with coord_flip, so you will need to transform the axis to get the pies circular.

So the first step is to reshape your data:

library(tidyverse)
library(scatterpie)

fake_dataframe <- fake_dataframe %>%
  rename(pos = fraction_of_total_celltype_abundance) %>%
  mutate(neg = 1 - pos) %>%
  mutate(annotation = fct_reorder(as.factor(annotation),
                                  as.factor(species_organ),
                                  ~mean(as.numeric(.x)))) %>%
  mutate(annotation2 = as.numeric(annotation)) %>%
  mutate(count_in_integratedcluster = count_in_integratedcluster/15)

Then the plotting code is:

ggplot(fake_dataframe,
       aes(x = annotation2, y = count_in_integratedcluster)) +
  geom_segment(aes(xend = annotation2, yend = 0), color = "grey") +
  geom_scatterpie(cols = c("pos", "neg"),
                  data = fake_dataframe,
                  aes(x = annotation2, y = count_in_integratedcluster)) +
  scale_fill_manual(values = c(pos = "black", neg = "white")) +
  coord_flip() +
  theme(
    legend.position = "none",
    panel.border = element_blank(),
    panel.spacing = unit(0.1, "lines"),
    strip.text.x = element_text(size = 8)
  ) +
  facet_grid(species_organ~., scale = "free_y", space = "free_y") +
  labs(title = paste(paste("integrated", cluster_name), ",", 
                     how_many_cells_in_this_integrated_cluster, "total cells"), 
       subtitle = paste0("In this integrated cluster, ",
                         "see what cells contribute per species"),
       y = "How many times cells of this original annotation (y-axis)
      showed up in this integrated cluster (plot title)",
       x = NULL) +
  scale_y_continuous(labels = ~.x * 15) +
  scale_x_continuous(labels = ~ levels(fake_dataframe$annotation)[.x])

enter image description here

Upvotes: 3

r2evans
r2evans

Reputation: 160952

I think pies on each end will be a bit difficult, and couldn't come up with something on the fly, I hope somebody else will have a good idea. (Challenges include preserving a 1-to-1 aspect ratio when one of the axes is ordinal, not continuous.)

Until then, two options:

Textual

Add geom_text and the percentage. (I added the caption for fun.)

ggplot(fake_dataframe) +
  geom_segment( aes(x=annotation, xend=annotation, y=0, yend=count_in_integratedcluster), color="grey") +
  geom_point( aes(x=annotation, y=count_in_integratedcluster, color=species_organ), size=3 ) +
  # BEGIN: addition
  geom_text(aes(x=annotation, y=count_in_integratedcluster,
                label=sprintf("%0.02f%%", 100 * fraction_of_total_celltype_abundance)),
            hjust = -0.2) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
  # END: addition
  coord_flip()+
  theme(
    legend.position = "none",
    panel.border = element_blank(),
    panel.spacing = unit(0.1, "lines"),
    strip.text.x = element_text(size = 8)
  ) +
  xlab("") +
  ylab("How many times cells of this original annotation (y-axis)\nshowed up in this integrated cluster (plot title)") +
  facet_wrap(~species_organ, ncol=1, scale="free_y") +
  labs(title = paste(paste("integrated", cluster_name, sep = " "), ",", how_many_cells_in_this_integrated_cluster, "total cells"), 
       subtitle = "In this integrated cluster, see what cells contribute per species",
       caption = "Percentage indicates the amount cells of a given source celltype that are bundled into the new clusters")

ggplot with percent indications

Filled rectangles

I suggest that a vertically-filled rectangle might also be useful to demonstrate graphically how much is included. Since geom_tile fills along its height, though, that means we need to undo the coord_flip() and deal with x and y in their natural domain. This involves (literally) swapping the letters x and y in your code and removing the flip (including the xlab and ylab). (Not sure why you needed to flip in the first place ...)

# swap x/y (un-flip coord)
barwid <- 5; barht <- 0.8
ggplot(fake_dataframe) +
  geom_segment( aes(y=annotation, yend=annotation, x=0, xend=count_in_integratedcluster-(barwid/2)), color="grey") +
  geom_tile(aes(y=annotation, x=count_in_integratedcluster, color=species_organ), width=barwid, height=barht, fill=NA) +
  geom_tile(aes(y=annotation, x=count_in_integratedcluster,
                height = barht * fraction_of_total_celltype_abundance,
                color=species_organ, fill=species_organ),
            width=barwid) +
  theme(
    legend.position = "none",
    panel.border = element_blank(),
    panel.spacing = unit(0.1, "lines"),
    strip.text.x = element_text(size = 8)
  ) +
  ylab(NULL) +
  xlab("How many times cells of this original annotation (y-axis)\nshowed up in this integrated cluster (plot title)") +
  facet_wrap(~species_organ, ncol=1, scale="free_y") +
  labs(title = paste(paste("integrated", cluster_name, sep = " "), ",", how_many_cells_in_this_integrated_cluster, "total cells"), 
       subtitle = "In this integrated cluster, see what cells contribute per species",
       caption = "Filled rectangles indicates the amount cells of a given source celltype that are bundled into the new clusters; full-color means 100% used")

ggplot with filled rectangles

This method can take a little more grooming based on the publication mechanism and your preferences. You can add size=1 (or some number) to the geom_tile(s) to change the stroke/line-width on the outline of the tile, to thicken it up. An alternative to this would be to make the first tile one color (white) and the second the real fill color, and you can remove the outline/stroke.

Beyond adding size=, work with barwid and barht to get how you want it to look.

FYI, ylab("") is different from ylab(NULL), as the former preserves the space for an empty label. Not sure if that's important to you, I demonstrated with NULL here but it'll work however you want.

Upvotes: 2

Related Questions