Loop for Selecting and Summarising Each Column for Later Permutation

Question

I have a dataset similar to the one below. The idea is I need to use a loop to do a permutation test for mean differences. My primary issue is I have to loop through columns in the dataset and I don't know how.

df = data.frame(matrix(rnorm(10), nrow=5)) 
category <- rep(c("good", "bad"), c(2, 3))
id <- c(1, 2, 3, 4, 5)
df <- cbind(id, df, category)

  id         X1         X2 category 
1  1  0.5584823 -2.3135133     good     
2  2 -0.1115585  0.4731869     good     
3  3 -0.7435472 -0.0231894      bad      
4  4 -0.6673812  0.7470000      bad      
5  5 -1.2959973  0.4255970      bad

So I need to basically do this in the loops:

merged_df %>% filter(category == "bad") %>% select(X1) %>% summarise(mean_X_bad = mean(X1))
merged_df %>% filter(category == "good") %>% select(X2) %>% summarise(mean_X_good = mean(X1))

For both X1 and X2 (and 98 other X variables not shown here).

So for each X from 1 to 100 I will have to get the mean of X in group = good and the mean of X in group = bad so that I can run a loop for permutation of mean differences in the value of X between the groups for all X.

I don't know how to run a loop that selects the column and maps it to the category and returns the mean of that subset. I assume in order for the permutation to be performed I need a vector of the "good" means and the "bad" means to compare. So I guess that has to be the result of the first loop?

Marian Minar · Accepted Answer

Gather your data first (make it "long" instead of "wide") by using tidyr::gather, then summarise by grouping the categories and variables:

library(tidyverse)

df %>%
  gather(key = "variable", value = "value", -id, -category) %>%
  group_by(category, variable) %>%
  summarise(mean = mean(value))

Here's the output:

# A tibble: 4 x 3
# Groups:   category [2]
  category variable    mean
            
1 bad      X1       -0.323 
2 bad      X2        0.342 
3 good     X1        0.0793
4 good     X2        0.632

Loop for Selecting and Summarising Each Column for Later Permutation

Answers (2)

data

Related Questions