How to average across several rows, for many columns, using data.table in R?

Question

I have a dataset where pairs of rows can have the same value on variable X1. I would like to average these paired rows' values in columns 2:40, into a new single row for each. Is there an easy way to do this?

If it were just one column I was averaging I think I could do this:

d[, X2 := X2, by = X1]

But this becomes very tedious for multiple columns. Is there a way to do this in data.table without having to type out X := X for each column?

Edit:

Here is a reproducible example. I would essentially like to end up with ten rows, one for each value of "cat". These rows would contain averages of x1, x2 and x3, for that level of "cat".

cat <- rep(1:10, times = 2)
x1 <- rnorm(20)
x2 <- rnorm(20)
x3 <- rnorm(20)

dat <- cbind(cat, x1, x2, x3)

dat <- as.data.frame(dat)

jared_mamrot · Accepted Answer

I'm not sure if this solution will suit, as you haven't provided a minimal reproducible example, but perhaps something like this?

library(data.table)

df <- data.frame(X1 = rep(1:50, each = 2),
                 X2 = rep(x = 1:2, times = 50),
                 X3 = rep(x = 1:2, times = 50),
                 X4 = rep(x = 1:2, times = 50),
                 X5 = rep(x = 1:2, times = 50),
                 X6 = rep(x = 1:2, times = 50),
                 X7 = rep(x = 1:2, times = 50),
                 X8 = rep(x = 1:2, times = 50),
                 X9 = rep(x = 1:2, times = 50),
                 X10 = rep(x = 1:2, times = 50)
                 )
setDT(df)
head(df)
#>    X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
#> 1:  1  1  1  1  1  1  1  1  1   1
#> 2:  1  2  2  2  2  2  2  2  2   2
#> 3:  2  1  1  1  1  1  1  1  1   1
#> 4:  2  2  2  2  2  2  2  2  2   2
#> 5:  3  1  1  1  1  1  1  1  1   1
#> 6:  3  2  2  2  2  2  2  2  2   2

df2 <- df[ ,lapply(.SD, mean), by = X1, .SDcols = X2:X10]
head(df2)
#>    X1  X2  X3  X4  X5  X6  X7  X8  X9 X10
#> 1:  1 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 2:  2 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 3:  3 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 4:  4 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 5:  5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 6:  6 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5

^{Created on 2021-07-16 by the reprex package (v2.0.0)}

--

Or maybe this?

library(data.table)

df <- data.frame(X1 = 1:100,
                 X2 = rep(x = 1:2, times = 50),
                 X3 = rep(x = 1:2, times = 50),
                 X4 = rep(x = 1:2, times = 50),
                 X5 = rep(x = 1:2, times = 50),
                 X6 = rep(x = 1:2, times = 50),
                 X7 = rep(x = 1:2, times = 50),
                 X8 = rep(x = 1:2, times = 50),
                 X9 = rep(x = 1:2, times = 50),
                 X10 = rep(x = 1:2, times = 50)
                 )
setDT(df)
head(df)
#>    X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
#> 1:  1  1  1  1  1  1  1  1  1   1
#> 2:  2  2  2  2  2  2  2  2  2   2
#> 3:  3  1  1  1  1  1  1  1  1   1
#> 4:  4  2  2  2  2  2  2  2  2   2
#> 5:  5  1  1  1  1  1  1  1  1   1
#> 6:  6  2  2  2  2  2  2  2  2   2

df2 <- df[, lapply(.SD, mean, na.rm=TRUE), X1-0:1]
head(df2)
#>    X1  X2  X3  X4  X5  X6  X7  X8  X9 X10
#> 1:  1 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 2:  3 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 3:  5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 4:  7 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 5:  9 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 6: 11 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5

^{Created on 2021-07-16 by the reprex package (v2.0.0)}

How to average across several rows, for many columns, using data.table in R?

Answers (1)

Related Questions