Reputation: 11762

scatter plot against all groups for a long data frame

I am pretty sure something like this is already asked but I don't know how to search for it.

I often get data in a wide format like in my little example with 3 experiments (a-c). I normally convert to long format and convert the values by some function (here log2 as an example).

What I often want to do is to plot all experiments against each other and here I am looking for a handy solution. How can I convert my data frame to get facets for example with a~b, a~c and b~c...

So far I tidy::spread the data again and execute 3 times a ggplot command with the individual column names as x and y. Later I merge the individual graphs together.

Is there a more convenient way?

library(dplyr)
library(tidyr)
library(ggplot2)

df <- data.frame(
  names=letters,
  a=1:26,
  b=1:13,
  c=11:36
)

df %>%
  tidyr::gather(experiment, value, -names) %>%
  mutate(log2.value=log2(value))

EDIT
Since I got a very useful answer from @hdkrgr I adapted a bit my code. The inner_join was a great trick which I can implement to automate my idea, what I still miss is a clever filter to get rid of the redundant data, since I don't want to plot c~c or b~a if I already plot a~b. I solved this now by providing the pairings I want to do, but can anyone think ob a straight forward solution? I couldn't think of something which gives me the unique pairing.

my_pairs <- c('a vs. b', 'a vs. c', 'b vs. c')

df %>%
  as_tibble() %>%
  tidyr::gather(experiment, value, -names) %>%
  mutate(log2.value=log2(value))  %>%
  inner_join(., ., by=c("names")) %>%
  mutate(pairing=sprintf('%s vs. %s', experiment.x, experiment.y)) %>%
  filter(pairing %in% my_pairs) %>% 
  ggplot(aes(log2.value.x, log2.value.y)) + 
  geom_point() + 
  facet_wrap( ~ pairing, labeller=label_both)

Upvotes: 4

Answers (4)

thothal

Reputation: 20329

You could start from creating all combinations via combnand then work your way through:

library(purrr)

t(combn(names(df)[-1], 2)) %>% ## get all combinations  
   as.data.frame(stringsAsFactors = FALSE) %>% 
   mutate(l = paste(V1, V2, sep = " vs. ")) %>%
   pmap_dfr(function(V1, V2, l) 
     df %>% 
       select(one_of(c(V1, V2))) %>% ## select the elements given by the combination
       mutate_all(log2) %>%
       setNames(c("x", "y")) %>%
       mutate(experiment = l)) %>%
   ggplot(aes(x, y)) + geom_point() + facet_wrap(~experiment)

Upvotes: 1

hdkrgr

Reputation: 1736

One way starting from long format would be to do a self-join on the long-data in order to get all combinations of two experiments in each row:

df %>%
    tidyr::gather(experiment, value, -names) %>%
    mutate(log2.value=log2(value)) %>%
    inner_join(., ., by=c("names")) %>% 
    ggplot(aes(log2.value.x, log2.value.y)) + geom_point() + facet_grid(experiment.y ~ experiment.x)

Edit: To avoid plotting redundant experiment-pairs, you can do:

df %>%
    tidyr::gather(experiment, value, -names) %>%
    mutate(log2.value=log2(value)) %>% inner_join(., ., by=c("names")) %>% 
    filter(experiment.x < experiment.y) %>% 
    ggplot(aes(log2.value.x, log2.value.y)) + geom_point() + facet_wrap(~experiment.y + experiment.x)

Upvotes: 5

camille

Reputation: 16842

This is really interesting because it's actually more complex than it first seems. One thing that sticks out is getting unique pairs of experiments—it seems like you'd want a vs b but not necessarily b vs a as well. To do that, you need the unique set of experiment pairs.

Initially, I tried to work from your gathered data, but realized it might be simpler to start from the wide version. Take the names of the experiments from the column names—you can do this multiple ways, but I just took the strings that aren't "names"—and get the combinations of them. I pasted them together to make them a little easier to work with.

library(dplyr)
library(tidyr)
library(ggplot2)

df <- data.frame(
  names=letters,
  a=1:26,
  b=1:13,
  c=11:36
) %>%
  as_tibble()

exp <- stringr::str_subset(names(df), "names", negate = T)

pairs <- combn(exp, 2, paste, simplify = F, collapse = ",") %>%
  unlist()
pairs
#> [1] "a,b" "a,c" "b,c"

Then, for each pair, extract the associated column names, do a little tidyeval to select those columns, do the log2 transform that you had. I had to detour here to rename the columns with something I could refer back to—I think this isn't necessary, but I couldn't get my tidyeval working inside the ggplot aes. Someone else might have an idea on that. Then make your plot, and label the axes and title accordingly. That leaves you with a list of 3 plots.

plots <- purrr::map(pairs, function(pair) {
  cols <- strsplit(pair, split = ",", fixed = T)[[1]]
  df %>%
    select(names, !!cols[1], !!cols[2]) %>%
    mutate_at(vars(-names), log2) %>%
    rename(exp1 = !!cols[1], exp2 = !!cols[2]) %>%
    ggplot(aes(x = exp1, y = exp2)) +
      geom_point() +
      labs(x = cols[1], y = cols[2], title = pair)
})

Use your method of choice to put the plots together however you want. I went with cowplot, but I also like the patchwork package.

cowplot::plot_grid(plotlist = plots, nrow = 1)

Upvotes: 3

www

Reputation: 39154

This is probably not what you want, but if the purpose is to explore the correlation pattern between each variable, you may want to consider ggpairs from the GGally package. It provides not only scatter plots, but also correlation score and distribution.

library(GGally)

ggpairs(df[, c("a", "b", "c")])

Upvotes: 2

scatter plot against all groups for a long data frame

Answers (4)

Related Questions