Yaser
Yaser

Reputation: 63

How to loop over variables indexed by a number in data.table?

I'm new to R and could use some help with the following problem:

I have a rather large dataset in a data.table format and I want to loop over a group of variables that are indexed by a number (say, x_1, x_2, ..., x_n). To make things simple, let's say I want to take the mean of each variable for different values of a variable y and name them, (m_1,m_2, ..., m_n) in my data.table.

Can someone suggest an efficient code that does this? n and the number of variables like x_* are too many for me to do this one by one.

Thanks

Upvotes: 3

Views: 47

Answers (2)

Gregor Thomas
Gregor Thomas

Reputation: 145765

Very simply and efficiently:

ind = 1:5 # replace 5 with your n
for (i in ind) {
  set(df, j = paste("m", i, sep = "_"), value = mean(df[[paste("x", i, sep = "_")]]))
}

set is usually extremely fast. It doesn't allow grouped operations, so if you need to group by another column, you'll need a different approach, for example:

ind = 1:5
df[, paste("m", ind, sep = "_") := lapply(.SD, mean), .SDcols = paste("x", ind, sep = "_")]

In the above, you could use the by argument normally.

Upvotes: 5

Gregory
Gregory

Reputation: 4279

This approach works with dplyr; not sure how to do the same with data.table.

library(dplyr)

df <- tibble(group = factor(rep(letters[1:4], 5)), 
             x_1 = rnorm(20, mean = 10), 
             x_2 = rnorm(20, mean = 20), 
             x_3 = rnorm(20, mean = 30))

group_by(df, group) %>%
summarize_all(.funs = c(mean, sd))

# # A tibble: 4 x 7
# group x_1_fn1 x_2_fn1 x_3_fn1 x_1_fn2 x_2_fn2 x_3_fn2
# <fct>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
# 1 a       10.1     19.9    30.1   0.684   0.792   0.461
# 2 b        9.99    19.2    30.2   1.14    1.20    0.960
# 3 c        9.32    20.3    30.0   0.762   0.721   1.56 
# 4 d        9.89    19.9    29.9   1.29    1.39    0.589

Upvotes: 2

Related Questions