Reputation: 11762
I am pretty sure something like this is already asked but I don't know how to search for it.
I often get data in a wide format like in my little example with 3 experiments (a-c). I normally convert to long format and convert the values by some function (here log2
as an example).
What I often want to do is to plot all experiments against each other and here I am looking for a handy solution. How can I convert my data frame to get facets for example with a~b
, a~c
and b~c
...
So far I tidy::spread
the data again and execute 3 times a ggplot
command with the individual column names as x
and y
. Later I merge the individual graphs together.
Is there a more convenient way?
library(dplyr)
library(tidyr)
library(ggplot2)
df <- data.frame(
names=letters,
a=1:26,
b=1:13,
c=11:36
)
df %>%
tidyr::gather(experiment, value, -names) %>%
mutate(log2.value=log2(value))
EDIT
Since I got a very useful answer from @hdkrgr I adapted a bit my code. The inner_join
was a great trick which I can implement to automate my idea, what I still miss is a clever filter to get rid of the redundant data, since I don't want to plot c~c
or b~a
if I already plot a~b
.
I solved this now by providing the pairings I want to do, but can anyone think ob a straight forward solution? I couldn't think of something which gives me the unique pairing.
my_pairs <- c('a vs. b', 'a vs. c', 'b vs. c')
df %>%
as_tibble() %>%
tidyr::gather(experiment, value, -names) %>%
mutate(log2.value=log2(value)) %>%
inner_join(., ., by=c("names")) %>%
mutate(pairing=sprintf('%s vs. %s', experiment.x, experiment.y)) %>%
filter(pairing %in% my_pairs) %>%
ggplot(aes(log2.value.x, log2.value.y)) +
geom_point() +
facet_wrap( ~ pairing, labeller=label_both)
Upvotes: 4
Views: 804
Reputation: 20329
You could start from creating all combinations via combn
and then work your way through:
library(purrr)
t(combn(names(df)[-1], 2)) %>% ## get all combinations
as.data.frame(stringsAsFactors = FALSE) %>%
mutate(l = paste(V1, V2, sep = " vs. ")) %>%
pmap_dfr(function(V1, V2, l)
df %>%
select(one_of(c(V1, V2))) %>% ## select the elements given by the combination
mutate_all(log2) %>%
setNames(c("x", "y")) %>%
mutate(experiment = l)) %>%
ggplot(aes(x, y)) + geom_point() + facet_wrap(~experiment)
Upvotes: 1
Reputation: 1736
One way starting from long format would be to do a self-join on the long-data in order to get all combinations of two experiments in each row:
df %>%
tidyr::gather(experiment, value, -names) %>%
mutate(log2.value=log2(value)) %>%
inner_join(., ., by=c("names")) %>%
ggplot(aes(log2.value.x, log2.value.y)) + geom_point() + facet_grid(experiment.y ~ experiment.x)
Edit: To avoid plotting redundant experiment-pairs, you can do:
df %>%
tidyr::gather(experiment, value, -names) %>%
mutate(log2.value=log2(value)) %>% inner_join(., ., by=c("names")) %>%
filter(experiment.x < experiment.y) %>%
ggplot(aes(log2.value.x, log2.value.y)) + geom_point() + facet_wrap(~experiment.y + experiment.x)
Upvotes: 5
Reputation: 16842
This is really interesting because it's actually more complex than it first seems. One thing that sticks out is getting unique pairs of experiments—it seems like you'd want a vs b but not necessarily b vs a as well. To do that, you need the unique set of experiment pairs.
Initially, I tried to work from your gather
ed data, but realized it might be simpler to start from the wide version. Take the names of the experiments from the column names—you can do this multiple ways, but I just took the strings that aren't "names"
—and get the combinations of them. I pasted them together to make them a little easier to work with.
library(dplyr)
library(tidyr)
library(ggplot2)
df <- data.frame(
names=letters,
a=1:26,
b=1:13,
c=11:36
) %>%
as_tibble()
exp <- stringr::str_subset(names(df), "names", negate = T)
pairs <- combn(exp, 2, paste, simplify = F, collapse = ",") %>%
unlist()
pairs
#> [1] "a,b" "a,c" "b,c"
Then, for each pair, extract the associated column names, do a little tidyeval to select those columns, do the log2
transform that you had. I had to detour here to rename the columns with something I could refer back to—I think this isn't necessary, but I couldn't get my tidyeval working inside the ggplot
aes
. Someone else might have an idea on that. Then make your plot, and label the axes and title accordingly. That leaves you with a list of 3 plots.
plots <- purrr::map(pairs, function(pair) {
cols <- strsplit(pair, split = ",", fixed = T)[[1]]
df %>%
select(names, !!cols[1], !!cols[2]) %>%
mutate_at(vars(-names), log2) %>%
rename(exp1 = !!cols[1], exp2 = !!cols[2]) %>%
ggplot(aes(x = exp1, y = exp2)) +
geom_point() +
labs(x = cols[1], y = cols[2], title = pair)
})
Use your method of choice to put the plots together however you want. I went with cowplot
, but I also like the patchwork
package.
cowplot::plot_grid(plotlist = plots, nrow = 1)
Upvotes: 3
Reputation: 39154
This is probably not what you want, but if the purpose is to explore the correlation pattern between each variable, you may want to consider ggpairs
from the GGally
package. It provides not only scatter plots, but also correlation score and distribution.
library(GGally)
ggpairs(df[, c("a", "b", "c")])
Upvotes: 2