Reputation: 1
I have a data frame in the format
> head(daten_strat)
id age gender anxiety
1 7 40 2 7
2 3 53 1 8
3 4 40 1 4
4 1 62 2 8
5 5 60 2 11
6 6 45 1 8
I would like to create 4 random groups that are as similar as possible in terms of the distribution of gender, age and anxiety.
In a university course, we plan an intervention with 4 different conditions. In order to assign the participants to the 4 conditions, I would like to use R to perform a stratified randomization. As a final result, I would like to have 4 groups as similar as possible in terms of age, gender, and level of anxiety. So that (somewhat simplified) differences in effectiveness cannot be attributed to demographic differences between the groups.
Upvotes: 0
Views: 467
Reputation: 3901
I would not call this task stratified sampling, you are not trying to get a representative sample of a population. What you are looking to do is partitioning. The anticlust
package with its anticlustering()
function provides a number of methods for this task. I'll show a basic example with defaults below. You might want to look into the methods more deeply if you want to use the partitioning for research purposes.
library(tidyverse)
library(anticlust)
set.seed(42)
# Example data
dat <- tibble(
id = as.character(1:100),
age = rnorm(100, 50, 10) |> round(),
gender = sample(1:2, 100, T),
anxiety = rnorm(100, 7.5, 2.25) |> round()
)
dat <- dat |>
mutate(group = anticlustering(dat[, -1], K = 4)) # Basic usage with defaults
dat
#> # A tibble: 100 × 5
#> id age gender anxiety group
#> <chr> <dbl> <int> <dbl> <dbl>
#> 1 1 64 2 7 2
#> 2 2 44 2 4 1
#> 3 3 54 1 10 4
#> 4 4 56 2 7 3
#> 5 5 54 1 6 3
#> 6 6 49 1 5 3
#> 7 7 65 2 7 3
#> 8 8 49 2 6 2
#> 9 9 70 2 6 1
#> 10 10 49 2 10 2
#> # … with 90 more rows
As you can see below, the between-group variance for all variables is fairly low.
# Means across groups
dat |>
group_by(group) |>
summarize(across(age:anxiety, mean))
#> # A tibble: 4 × 4
#> group age gender anxiety
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 50.3 1.48 7.48
#> 2 2 50.2 1.44 7.52
#> 3 3 50.5 1.44 7.4
#> 4 4 50.2 1.44 7.44
Upvotes: 1