AlexP
AlexP

Reputation: 647

How to pass a variable column name to the "by" command?

I use the data.table package in R to summarize data often. In this particular case, I'm just counting the number of occurrences in a dataset for given column groups. But I'm having trouble incorporating a loop to do this dynamically.

Normally, I'd summarize data like this.

data <- data.table(mpg)
data.temp1 <- data[, .N, by="manufacturer,class"]
data.temp2 <- data[, .N, by="manufacturer,trans"]

But now I want to loop through the columns of interest in my dataset and plot. Rather than repeating the code over and over, I want to put it in a for loop. Something like this:

columns <- c('class', 'trans')

for (i in 1:length(columns)) {
    data.temp <- data[, .N, by=list(manufacturer,columns[i])]
    #plot data
}

If I only wanted the column of interest, I could do this in the loop and it works:

data.temp <- data[, .N, by=get(columns[i])]

But if I want to put in a static column name, like manufacturer, it breaks. I can't seem to figure out how to mix a static column name along with a dynamic one. I've looked around but can't find an answer.

Would appreciate any thoughts!

Upvotes: 1

Views: 104

Answers (1)

MrFlick
MrFlick

Reputation: 206253

You should be fine if you just quote `"manufacturer"

data.temp <- data[, .N, by=c("manufacturer",columns[i])]

From the ?'[.data.table' help page, by= can be

A single unquoted column name, a list() of expressions of column names, a single character string containing comma separated column names (where spaces are significant since column names may contain spaces even at the start or end), or a character vector of column names.

This seems like the easiest way to give you what you need.

Upvotes: 5

Related Questions