Reputation: 189
I have the following tibble:
tTest = tibble(Cells = rep(c("C1", "C2", "C3"), times = 3),
Gene = rep(c("G1", "G2", "G3"), each = 3),
Experiment_score = 1:9,
Pattern1 = 1:9,
Pattern2 = -(1:9),
Pattern3 = 9:1) %>%
group_by(Gene)
and I would like to correlate the Experiment_score
with each of the Pattern
columns for all Gene
.
Looking at the tidyverse across page and examples, I thought this would work:
# `corList` is a simple wrapper for `cor` to have exactly two outputs:
corList = function(x, y) {
result = cor.test(x, y)
return(list(stat = result$estimate, pval = result$p.value))
}
tTest %>% summarise(across(starts_with("Pattern"), ~ corList(Experiment_score, .x), .names = "{.col}_corr_{.fn}"))
I have found a solution by melting the Pattern
columns and I will post it down below for completeness but the challenge is that I have dozens of Pattern
columns and millions of rows. If I melt the Pattern
columns, I end up with half a billion rows, seriously hampering my ability to work with the data.
EDIT: My own imperfect solution:
# `corVect` is a simple wrapper for `cor` to have exactly two outputs:
corVect = function(x, y) {
result = cor.test(x, y)
return(c(stat = result$estimate, pval = result$p.value))
}
tTest %>% pivot_longer(starts_with("Pattern"), names_to = "Pattern", values_to = "Strength") %>%
group_by(Gene, Pattern) %>%
summarise(CorrVal = corVect(Experiment_score, Strength)) %>%
mutate(CorrType = c("corr", "corr_pval")) %>%
# Reformat
pivot_wider(id_cols = c(Gene, Pattern), names_from = CorrType, values_from = CorrVal)
Upvotes: 1
Views: 279
Reputation: 7626
To get the desired result in one step, wrap the function return as a tibble
rather than a list
, and call .unpack = TRUE
in across
. Here using a conveniently-named corTibble
function:
library(tidyverse)
tTest = tibble(
Cells = rep(c("C1", "C2", "C3"), times = 3),
Gene = rep(c("G1", "G2", "G3"), each = 3),
Experiment_score = 1:9,
Pattern1 = 1:9 + rnorm(9), # added some noise
Pattern2 = -(1:9 + rnorm(9)),
Pattern3 = 9:1 + rnorm(9)
) %>%
group_by(Gene)
corTibble = function(x, y) {
result = cor.test(x, y)
return(tibble(stat = result$estimate, pval = result$p.value))
}
tTest %>% summarise(across(
starts_with("Pattern"),
~ corTibble(Experiment_score, .x),
.names = "{.col}_corr",
.unpack = TRUE
))
#> # A tibble: 3 × 7
#> Gene Pattern1_corr_stat Pattern1_corr_pval Pattern2…¹ Patte…² Patte…³ Patte…⁴
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 G1 0.947 0.208 -0.991 0.0866 -1.00 0.0187
#> 2 G2 0.964 0.172 -0.872 0.325 -0.981 0.126
#> 3 G3 0.995 0.0668 -0.680 0.524 -0.409 0.732
#> # … with abbreviated variable names ¹Pattern2_corr_stat, ²Pattern2_corr_pval,
#> # ³Pattern3_corr_stat, ⁴Pattern3_corr_pval
Upvotes: 1