Reputation: 111
I have a dataset similar to the one below. The idea is I need to use a loop to do a permutation test for mean differences. My primary issue is I have to loop through columns in the dataset and I don't know how.
df = data.frame(matrix(rnorm(10), nrow=5))
category <- rep(c("good", "bad"), c(2, 3))
id <- c(1, 2, 3, 4, 5)
df <- cbind(id, df, category)
id X1 X2 category
1 1 0.5584823 -2.3135133 good
2 2 -0.1115585 0.4731869 good
3 3 -0.7435472 -0.0231894 bad
4 4 -0.6673812 0.7470000 bad
5 5 -1.2959973 0.4255970 bad
So I need to basically do this in the loops:
merged_df %>% filter(category == "bad") %>% select(X1) %>% summarise(mean_X_bad = mean(X1))
merged_df %>% filter(category == "good") %>% select(X2) %>% summarise(mean_X_good = mean(X1))
For both X1 and X2 (and 98 other X variables not shown here).
So for each X from 1 to 100 I will have to get the mean of X in group = good and the mean of X in group = bad so that I can run a loop for permutation of mean differences in the value of X between the groups for all X.
I don't know how to run a loop that selects the column and maps it to the category and returns the mean of that subset. I assume in order for the permutation to be performed I need a vector of the "good" means and the "bad" means to compare. So I guess that has to be the result of the first loop?
Upvotes: 1
Views: 72
Reputation: 887078
If we want to loop, then use map2
. Based on the OP's code, we are filter
ing the 'bad', 'good' and select
ing columns 'X1', 'X2'. So, pass these as two vector
s in map2
, filter
, select
the dataset, and summarise
the mean
of the select
ed column with a new name
library(tidyverse)
map2(c("bad", "good"), c("X1", "X2"), ~
df %>%
filter(category == .x) %>%
select(.y) %>%
summarise(!! paste0("mean_X_", .x) := mean(!! rlang::sym(.y))))
#[[1]]
# mean_X_bad
#1 -0.4954794
#[[2]]
# mean_X_good
#1 0.7497338
Instead of filter
ing by 'category, it can be group
ed and then use summarise_at
df %>%
group_by(category) %>%
summarise_at(vars(matches("^X\\d+$")), mean)
# A tibble: 2 x 3
# category X1 X2
# <fct> <dbl> <dbl>
#1 bad 0.228 -0.438
#2 good -0.00465 0.355
and that gives the same output without any gather
ing (only the results are transposed in gather
ing)
df %>%
gather(key = "variable", value = "value", -id, -category) %>%
group_by(category, variable) %>%
summarise(mean = mean(value))
# A tibble: 4 x 3
# Groups: category [2]
# category variable mean
# <fct> <chr> <dbl>
#1 bad X1 0.228
#2 bad X2 -0.438
#3 good X1 -0.00465
#4 good X2 0.355
set.seed(24)
df = data.frame(matrix(rnorm(10), nrow=5))
category <- rep(c("good", "bad"), c(2, 3))
id <- c(1, 2, 3, 4, 5)
df <- cbind(id, df, category)
Upvotes: 0
Reputation: 1456
Gather your data first (make it "long" instead of "wide") by using tidyr::gather
, then summarise by grouping the categories and variables:
library(tidyverse)
df %>%
gather(key = "variable", value = "value", -id, -category) %>%
group_by(category, variable) %>%
summarise(mean = mean(value))
Here's the output:
# A tibble: 4 x 3
# Groups: category [2]
category variable mean
<fct> <chr> <dbl>
1 bad X1 -0.323
2 bad X2 0.342
3 good X1 0.0793
4 good X2 0.632
Upvotes: 1