Reputation: 33
I've made a plot I am mostly happy with, using Tidyverse in R, but the plot needs to show a bit more information and I haven't managed to work out how to do that yet.
The point of the plot is to show how a bunch of cells from three different animals were algorithmically sorted and bunched together according to their biology. Each animal had a lot of different celltypes, and there were a lot outputted clusters of cells; I am plotting one outputted cluster, and after looking at all the cells from each animal that were sorted into this cluster, I chose to show the top 5 celltype names from the source animals that made it into the plot. The plot shows this nicely (to me, at least), but it doesn't show whether ALL the cells of a given source celltype were bundled into this new cluster, or if half, or almost none, etc..
Here is the code I used, and the plot I got (and mostly like!).
library(tidyverse)
# create the contents of the toy dataset, then add together
species_organ <- c(rep("frog", 5),
rep("bat", 5),
rep("bird", 5)
)
annotation <- c("celltype1", "celltype2", "celltype3", "celltype4", "celltype5",
"celltypeA", "celltypeB", "celltypeC", "celltypeD", "celltypeE",
"celltypeAlpha", "celltypeBeta", "celltypeGamma", "celltypeDelta", "celltypeEpsilon"
)
count_in_integratedcluster <- c(253, 245, 226, 187, 185, 42, 18, 17, 11, 9, 58, 16, 8, 8, 7)
annotation_count_in_source_dataset <- c(413, 312, 349, 410, 233, 195, 198, 56, 166, 238, 82, 68, 270, 226, 81)
fraction_of_total_celltype_abundance <- count_in_integratedcluster / annotation_count_in_source_dataset
fake_dataframe <- data.frame(species_organ, annotation, count_in_integratedcluster, annotation_count_in_source_dataset, fraction_of_total_celltype_abundance)
# a few other things to decorate the plot with
how_many_cells_in_this_integrated_cluster <- 5056
cluster_name = "cluster6"
# now we make a lollipop plot
plot_lollipop_faceted.top5 <- ggplot(fake_dataframe) +
geom_segment( aes(x=annotation, xend=annotation, y=0, yend=count_in_integratedcluster), color="grey") +
geom_point( aes(x=annotation, y=count_in_integratedcluster, color=species_organ), size=3 ) +
coord_flip()+
theme(
legend.position = "none",
panel.border = element_blank(),
panel.spacing = unit(0.1, "lines"),
strip.text.x = element_text(size = 8)
) +
xlab("") +
ylab("How many times cells of this original annotation (y-axis)\nshowed up in this integrated cluster (plot title)") +
facet_wrap(~species_organ, ncol=1, scale="free_y") +
labs(title = paste(paste("integrated", cluster_name, sep = " "), ",", how_many_cells_in_this_integrated_cluster, "total cells"),
subtitle = "In this integrated cluster, see what cells contribute per species")
(plot I mostly like but which needs improvement)
An "easy" graphical fix would be to replace the geom_point with a cute little pie chart, with colour filling to report whether 90% of "bird muscle cells" or just 10% of "bird muscle cells" were ultimately apportioned to this cluster by the algorithm.
Here is a pencil sketch of how the graph could look like, if I made the swap I am looking for.
pencil sketch of improved plot
Any solution has to be in R, and I would appreciate Tidyverse-based approaches but I'm willing to try other approaches that convey the desired set of information.
I've looked at other related questions and unfortunately couldn't manage to make the suggested methods work for me, or else the suggested solution doesn't seem to be useful in my scenario; so far, I've examined:
R::ggplot2::geom_points: how to swap points with pie charts? (scatterpie docs didn't help me make sense of what to do to implement suggestions) ggplot use small pie charts as points with geom_point (the pie is nice, but I don't want to lose the other information currently conveyed by my plot already) Plotting pie charts in ggplot2 (title sounds right, but content was not helpful) create floating pie charts with ggplot (this is the second time I saw coord_polar() but I did not figure out how to use it after fiddling a bit with it/reading its docs)
Upvotes: 3
Views: 116
Reputation: 174526
We can use scatterpie
to get the plot you want, but it's a bit of a pain to use. It doesn't seem to like categorical variables, so these need to be converted to numeric via factor and relabelled in scales. It also won't play nicely with coord_flip
, so you will need to transform the axis to get the pies circular.
So the first step is to reshape your data:
library(tidyverse)
library(scatterpie)
fake_dataframe <- fake_dataframe %>%
rename(pos = fraction_of_total_celltype_abundance) %>%
mutate(neg = 1 - pos) %>%
mutate(annotation = fct_reorder(as.factor(annotation),
as.factor(species_organ),
~mean(as.numeric(.x)))) %>%
mutate(annotation2 = as.numeric(annotation)) %>%
mutate(count_in_integratedcluster = count_in_integratedcluster/15)
Then the plotting code is:
ggplot(fake_dataframe,
aes(x = annotation2, y = count_in_integratedcluster)) +
geom_segment(aes(xend = annotation2, yend = 0), color = "grey") +
geom_scatterpie(cols = c("pos", "neg"),
data = fake_dataframe,
aes(x = annotation2, y = count_in_integratedcluster)) +
scale_fill_manual(values = c(pos = "black", neg = "white")) +
coord_flip() +
theme(
legend.position = "none",
panel.border = element_blank(),
panel.spacing = unit(0.1, "lines"),
strip.text.x = element_text(size = 8)
) +
facet_grid(species_organ~., scale = "free_y", space = "free_y") +
labs(title = paste(paste("integrated", cluster_name), ",",
how_many_cells_in_this_integrated_cluster, "total cells"),
subtitle = paste0("In this integrated cluster, ",
"see what cells contribute per species"),
y = "How many times cells of this original annotation (y-axis)
showed up in this integrated cluster (plot title)",
x = NULL) +
scale_y_continuous(labels = ~.x * 15) +
scale_x_continuous(labels = ~ levels(fake_dataframe$annotation)[.x])
Upvotes: 3
Reputation: 160952
I think pies on each end will be a bit difficult, and couldn't come up with something on the fly, I hope somebody else will have a good idea. (Challenges include preserving a 1-to-1 aspect ratio when one of the axes is ordinal, not continuous.)
Until then, two options:
Add geom_text
and the percentage. (I added the caption for fun.)
ggplot(fake_dataframe) +
geom_segment( aes(x=annotation, xend=annotation, y=0, yend=count_in_integratedcluster), color="grey") +
geom_point( aes(x=annotation, y=count_in_integratedcluster, color=species_organ), size=3 ) +
# BEGIN: addition
geom_text(aes(x=annotation, y=count_in_integratedcluster,
label=sprintf("%0.02f%%", 100 * fraction_of_total_celltype_abundance)),
hjust = -0.2) +
scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
# END: addition
coord_flip()+
theme(
legend.position = "none",
panel.border = element_blank(),
panel.spacing = unit(0.1, "lines"),
strip.text.x = element_text(size = 8)
) +
xlab("") +
ylab("How many times cells of this original annotation (y-axis)\nshowed up in this integrated cluster (plot title)") +
facet_wrap(~species_organ, ncol=1, scale="free_y") +
labs(title = paste(paste("integrated", cluster_name, sep = " "), ",", how_many_cells_in_this_integrated_cluster, "total cells"),
subtitle = "In this integrated cluster, see what cells contribute per species",
caption = "Percentage indicates the amount cells of a given source celltype that are bundled into the new clusters")
I suggest that a vertically-filled rectangle might also be useful to demonstrate graphically how much is included. Since geom_tile
fills along its height, though, that means we need to undo the coord_flip()
and deal with x and y in their natural domain. This involves (literally) swapping the letters x
and y
in your code and removing the flip (including the xlab
and ylab
). (Not sure why you needed to flip in the first place ...)
# swap x/y (un-flip coord)
barwid <- 5; barht <- 0.8
ggplot(fake_dataframe) +
geom_segment( aes(y=annotation, yend=annotation, x=0, xend=count_in_integratedcluster-(barwid/2)), color="grey") +
geom_tile(aes(y=annotation, x=count_in_integratedcluster, color=species_organ), width=barwid, height=barht, fill=NA) +
geom_tile(aes(y=annotation, x=count_in_integratedcluster,
height = barht * fraction_of_total_celltype_abundance,
color=species_organ, fill=species_organ),
width=barwid) +
theme(
legend.position = "none",
panel.border = element_blank(),
panel.spacing = unit(0.1, "lines"),
strip.text.x = element_text(size = 8)
) +
ylab(NULL) +
xlab("How many times cells of this original annotation (y-axis)\nshowed up in this integrated cluster (plot title)") +
facet_wrap(~species_organ, ncol=1, scale="free_y") +
labs(title = paste(paste("integrated", cluster_name, sep = " "), ",", how_many_cells_in_this_integrated_cluster, "total cells"),
subtitle = "In this integrated cluster, see what cells contribute per species",
caption = "Filled rectangles indicates the amount cells of a given source celltype that are bundled into the new clusters; full-color means 100% used")
This method can take a little more grooming based on the publication mechanism and your preferences. You can add size=1
(or some number) to the geom_tile
(s) to change the stroke/line-width on the outline of the tile, to thicken it up. An alternative to this would be to make the first tile one color (white) and the second the real fill color, and you can remove the outline/stroke.
Beyond adding size=
, work with barwid
and barht
to get how you want it to look.
FYI, ylab("")
is different from ylab(NULL)
, as the former preserves the space for an empty label. Not sure if that's important to you, I demonstrated with NULL
here but it'll work however you want.
Upvotes: 2