How to loop over variables indexed by a number in data.table?

Question

I'm new to R and could use some help with the following problem:

I have a rather large dataset in a data.table format and I want to loop over a group of variables that are indexed by a number (say, x_1, x_2, ..., x_n). To make things simple, let's say I want to take the mean of each variable for different values of a variable y and name them, (m_1,m_2, ..., m_n) in my data.table.

Can someone suggest an efficient code that does this? n and the number of variables like x_* are too many for me to do this one by one.

Thanks

Gregor Thomas · Accepted Answer

Very simply and efficiently:

ind = 1:5 # replace 5 with your n
for (i in ind) {
  set(df, j = paste("m", i, sep = "_"), value = mean(df[[paste("x", i, sep = "_")]]))
}

set is usually extremely fast. It doesn't allow grouped operations, so if you need to group by another column, you'll need a different approach, for example:

ind = 1:5
df[, paste("m", ind, sep = "_") := lapply(.SD, mean), .SDcols = paste("x", ind, sep = "_")]

In the above, you could use the by argument normally.

How to loop over variables indexed by a number in data.table?

Answers (2)

Related Questions