dh han
dh han

Reputation: 23

Why not aggregating rows first in data.table

I am confused about why I can't get average value for each row by mean() in data.table.

> aaa <- data.table(matrix(1:9, nrow = 3))
> aaa[, `:=` (avg = mean(V1 + V2 +V3), onethird = (V1 + V2 +V3)/3)]
> aaa
   V1 V2 V3 avg onethird
1:  1  4  7  15        4
2:  2  5  8  15        5
3:  3  6  9  15        6

It seems what data.table did is mean(V1) + mean(V2) + mean(V3) rather than mean(V1 + V2 +V3).

~~~~~~~~~~~~~~~~~~~~~~~~~

Actually I want to generate some more columns by calculating the average value of other columns, like getting avg12 from V1 and V2, getting avg345 from V3, V4 and V5.

> aaa <- data.table(matrix(1:10, nrow = 2))
> aaa[, `:=` (avg12 = (V1 + V2)/2, avg345 = (V3 + V4 + V5)/3)]
> aaa
   V1 V2 V3 V4 V5 avg12 avg345
1:  1  3  5  7  9     2      7
2:  2  4  6  8 10     3      8

Is it possible to use some simple mean function on (V1 + V2) or (V1, V2)?

Upvotes: 2

Views: 84

Answers (1)

akrun
akrun

Reputation: 887153

We can use rowMeans to get the mean of each row. It can also be applied directly to the dataset (.SD - Subset of Data.table, when we don't specify the .SDcols, it takes all the columns in the dataset)

aaa[, `:=` (avg = rowMeans(.SD), onethird = (V1 + V2 + V3)/3)]

Or another option is get the sum by row with Reduce and then divide by the number of columns (length(.SD))

aaa[, `:=` (avg = Reduce(`+`, .SD)/length(.SD), onethird = (V1 + V2 +V3)/3)]

Upvotes: 1

Related Questions