Reputation: 2253
Let's say I want to compare the price of apples and oranges in each country in two different currencies: USA and BTC.
USA ~ fruit for each country
BTC ~ fruit for each country
library(tidyverse)
prices <- tibble(
country = c(rep("USA", 6), rep("Spain", 6), rep("Korea", 6)),
fruit = rep(c("apples", "apples", "apples", "oranges", "oranges", "oranges"), 3),
price_USA = rnorm(18),
price_BTC = rnorm(18)
)
prices %>%
group_by(country) %>%
summarise(
pval_USA = t.test(price_USA ~ fruit)$p.value
pval_BTC = t.test(price_BTC ~ fruit)$p.value
)
Now let's say there are many columns and I want to use summarise_all
instead of naming each column. Is there a way to perform a t-test within each group (country
) and on each column (price_USA
, price_BTC
) using the dplyr::summarise_all
function? The approaches I've tried so far have been giving me errors.
prices %>%
group_by(country) %>%
summarise_at(
c("price_USA", "price_BTC"),
function(x) {t.test(x ~ .$fruit)$p.value}
)
> Error in model.frame.default(formula = x ~ .$fruit) :
variable lengths differ (found for '.$fruit')
Upvotes: 2
Views: 337
Reputation: 1428
You can do this by reshaping your data from wide to long format. Here's a solution using dplyr:
library(tidyverse)
prices <- tibble(
country = c(rep("USA", 6), rep("Spain", 6), rep("Korea", 6)),
fruit = rep(c("apples", "apples", "apples", "oranges", "oranges", "oranges"), 3),
price_USA = rnorm(18),
price_BTC = rnorm(18)
)
prices %>%
pivot_longer(cols = starts_with("price"), names_to = "name",
values_to = "price", names_prefix = "price_") %>%
group_by(country, name) %>%
summarise(pval = t.test(price ~ fruit)$p.value)
#> # A tibble: 6 x 3
#> # Groups: country [3]
#> country name pval
#> <chr> <chr> <dbl>
#> 1 Korea BTC 0.458
#> 2 Korea USA 0.721
#> 3 Spain BTC 0.732
#> 4 Spain USA 0.526
#> 5 USA BTC 0.916
#> 6 USA USA 0.679
Upvotes: 2