ChenG
ChenG

Reputation: 19

Aggregate AND count data in R

I have a data frame with N participants. Each participant has 50 trials, half of them with condition A and half with condition B. In each trial, they either got 0 or 1 in a certain variable. I need to count the occurrences of the 0's or 1's for each participant, in each of the conditions.

so far, i tried something like this:

the_answer = aggregate(certain_variable==0 ~ participant, data = data[data$condition=="A" , ], FUN = sum, na.rm = TRUE).

The problem is I always get a different number of participants in my results, instead of getting the same N participants, with different counting of the variables...

Hope i was clear enough... I would really appreciate any help...

thanks!

Upvotes: 0

Views: 1129

Answers (1)

socialscientist
socialscientist

Reputation: 4232

Generate example data

###########################################################################
# Set-up
###########################################################################

# Packages
library(tibble)
libary(dplyr)

# Simulation parameters
set.seed(123)
participant_n <- 3
trial_n <- 50
trials_per_arm <- trial_n * 0.5
outcome_prob_A <- 0.8
outcome_prob_B <- 0.2

###########################################################################
# Simulate data
###########################################################################

# Participant and trials structure
data <- tibble(
  participant = rep(1:participant_n, trial_n),
  trial = rep(1:trial_n, each = participant_n),
)

# Randomly assign half of the trials to each condition, letting the trials
# assigned vary across participants
data <- data %>%
  group_by(participant) %>%
  mutate(
    condition = sample(rep(c("A", "B"), trials_per_arm),
                       trial_n,
                       replace = FALSE),
    outcome = case_when(
      condition == "A" ~ rbinom(n(), 1, outcome_prob_A),
      condition == "B" ~ rbinom(n(), 1, outcome_prob_B)
    )
  )


#> # A tibble: 150 x 4
#> # Groups:   participant [3]
#>    participant trial condition outcome
#>          <int> <int> <chr>       <int>
#>  1           1     1 A               1
#>  2           2     1 A               1
#>  3           3     1 B               0
#>  4           1     2 A               1
#>  5           2     2 B               0
#>  6           3     2 B               1
#>  7           1     3 B               1
#>  8           2     3 A               1
#>  9           3     3 B               0
#> 10           1     4 A               1
#> # ... with 140 more rows

Count each outcome for each participant

data %>%
  group_by(participant, condition, outcome) %>%
  tally() %>%
  ungroup()
#> # A tibble: 12 x 4
#>    participant condition outcome     n
#>          <int> <chr>       <int> <int>
#>  1           1 A               0     2
#>  2           1 A               1    23
#>  3           1 B               0    21
#>  4           1 B               1     4
#>  5           2 A               0     5
#>  6           2 A               1    20
#>  7           2 B               0    22
#>  8           2 B               1     3
#>  9           3 A               0     4
#> 10           3 A               1    21
#> 11           3 B               0    22
#> 12           3 B               1     3

# If you just want counts for each outcome for each condition:
data %>%
  group_by(condition, outcome) %>%
  tally() %>%
  ungroup()
#> # A tibble: 4 x 3
#>   condition outcome     n
#>   <chr>       <int> <int>
#> 1 A               0    11
#> 2 A               1    64
#> 3 B               0    65
#> 4 B               1    10

Upvotes: 1

Related Questions