Reputation: 311
Sorry, I am probably using the wrong search terms but I couldn't find a solution.
Given an experiment with two participants (id), each performing a task 6 times under two varying parameters (par1,par2):
id <- c(rep(1,6),rep(2,6))
par1 <- c(rep("a",9),rep("b",3))
par2 <- c(rep("c",3),rep("d",9))
val <- rnorm(12)
data <- data.frame(id,par1,par2,val)
How can I replace all rows with identical values for "id","par1" and "par2" by a single row in which the value of "val" is the mean of the "val" values of the replaced rows?
The outcome is thus a table like this:
id par1 par2 val
1 a c (mean of row 1-3)
1 a d (mean of row 4-6)
2 a d (mean of row 7-9)
2 b d (mean of row 10-12)
Upvotes: 2
Views: 96
Reputation: 886938
Here is an option with data.table
. Convert the 'data.frame' to 'data.table' (setDT(data)
), grouped by 'id', 'par1', 'par2', get the mean
of 'val'
library(data.table)
setDT(data)[, .(val = mean(val)), by = .(id, par1, par2)]
Upvotes: 1
Reputation: 3053
For a dplyr
approach:
library(dplyr)
set.seed(123) # for reproducibility
id <- c(rep(1, 6), rep(2, 6))
par1 <- c(rep("a", 9), rep("b", 3))
par2 <- c(rep("c", 3), rep("d", 9))
val <- rnorm(12)
data <- data.frame(id, par1, par2, val)
# group by all variables except `val`
data %>% group_by_at(vars(-val)) %>% summarize(val = mean(val))
Which gives:
# A tibble: 4 x 4
# Groups: id, par1 [?]
id par1 par2 val
<dbl> <fctr> <fctr> <dbl>
1 1 a c 0.2560184
2 1 a d 0.6382870
3 2 a d -0.4969993
4 2 b d 0.3794112
Upvotes: 2