Reputation: 462

How to reference a variable in a for loop?

I am looping through different data.tables and the variables in the data.table. But I'm having trouble referencing the variables inside of the for loop

dt1 <- data.table(a1 = c(1,2,3), a2 = c(4,5,2))
dt2 <- data.table(a1 = c(1,43,1), a2 = c(52,4,1))

For each datatable, I want to find the average of each variable for observations where that variable != 1. Below is my attempt which doesn't work:

dtname = 'dt'
ind  = c('1', '2')
for (d in ind) {
  df <- get(paste0('dt', d, sep=''))
  for (v in ind) {
    varname <- paste0('a', v, sep='')
    df1 <- df %>%
      filter(varname!=1) %>%
      summarise(varname = mean(varname))
    print(df1)
    }
   }

The desired output is to take and print the average of a1 = c(2,3) in dt1, the average of a2 = (4,5,2) in dt1, the average of a1 = c(43) in dt2, the average of a2 = c(54,4) in dt2.

What am I doing wrong here? In general, how should I reference a variable inside of a for loop (varname) that is pieced together by using the looping index (v) and something else?

Upvotes: 2

Answers (4)

Amazonian

Reputation: 462

I figured out a solution based on the comments of @Amar and @Scott Richie

for (d in ind) {
  df <- get(paste0('dt', d, sep=''))
  for (v in ind) {
    varname <- paste0('a', v, sep='')
    df1 <- df[eval(as.name(varname))!=1, .(mean = 
                                 mean(eval(as.name(varname))))]
    print(df1)

   }
 }

Thanks EVERYONE!

Upvotes: 1

adono

Reputation: 18

It is not very clear what you are trying to do, but if you want to replace all of the rows in the dataframe with the mean of the previous data frame's columns, I would suggest using a dataframe type instead as it is easier to index. Here is code that should work:

dt1 <- data.frame(a1 = c(1,2,3), a2 = c(4,5,2))
dt2 <- data.frame(a1 = c(1,43,1), a2 = c(52,4,1))

dtname = 'dt'
ind  = c('1', '2')
for (d in ind){
  df <- get(paste0('dt', d, sep=''))
  for (i in 1:nrow(df)){
    for (j in 1:ncol(df)){
      if (df[i,j] !=1){
        df[,j]<- mean(df[,j])
      }
     }
    print(df)
  }
}

The reason your code was not working before was because the variables were being treated like strings, not actual variables. You can see this by printing the data type of variances:

dtname = 'dt'
ind  = c('1', '2')
for (d in ind) {
  df <- get(paste0('dt', d, sep=''))
  for (v in ind) {
    varname <- paste0('a', v, sep='')
    print(class(varname))
  }
}

Which just returns "character"

Another solution using variable names and the dataframe type would be to index the df as follows:

df[["varname"]]

Here are two helpful links for this kind of operation:
* link 1: How to find the mean of a column
* link 2: Data frames

Upvotes: 0

Scott Ritchie

Reputation: 10543

For a purely data.table way, I would combine the different data.tables and compute the averages:

# Concatenate the data.tables: 
all_dt <- rbind("dt1" = dt1, "dt2" = dt2, idcol = "origin")
all_dt
#    origin a1 a2
# 1:    dt1  1  4
# 2:    dt1  2  5
# 3:    dt1  3  2
# 4:    dt2  1 52
# 5:    dt2 43  4
# 6:    dt2  1  1

# Melt so that "a1" and "a2" are labels in a group column:
all_dt <- melt(all_dt, id.vars="origin")
all_dt
#     origin variable value
#  1:    dt1       a1     1
#  2:    dt1       a1     2
#  3:    dt1       a1     3
#  4:    dt2       a1     1
#  5:    dt2       a1    43
#  6:    dt2       a1     1
#  7:    dt1       a2     4
#  8:    dt1       a2     5
#  9:    dt1       a2     2
# 10:    dt2       a2    52
# 11:    dt2       a2     4
# 12:    dt2       a2     1

# Compute averages by each data.table and column group, ignoring 1s:
all_dt[value != 1, .(mean = mean(value)), by = .(origin, variable)]
#    origin variable      mean
# 1:    dt1       a1  2.500000
# 2:    dt2       a1 43.000000
# 3:    dt1       a2  3.666667
# 4:    dt2       a2 28.000000

Upvotes: 3

tjebo

Reputation: 23797

Would go for a vectorised approach. You are using R!

One possible way:

require(dplyr)

dt1[dt1==1] <- NA #replace 1 with NA

dt1 %>% summarise_all(mean, na.rm = TRUE) #mean of all columns. 

   a1       a2
1 2.5 3.666667

Upvotes: 0

How to reference a variable in a for loop?

Answers (4)

Related Questions