Reputation: 462
I am looping through different data.tables and the variables in the data.table. But I'm having trouble referencing the variables inside of the for
loop
dt1 <- data.table(a1 = c(1,2,3), a2 = c(4,5,2))
dt2 <- data.table(a1 = c(1,43,1), a2 = c(52,4,1))
For each datatable, I want to find the average of each variable for observations where that variable != 1. Below is my attempt which doesn't work:
dtname = 'dt'
ind = c('1', '2')
for (d in ind) {
df <- get(paste0('dt', d, sep=''))
for (v in ind) {
varname <- paste0('a', v, sep='')
df1 <- df %>%
filter(varname!=1) %>%
summarise(varname = mean(varname))
print(df1)
}
}
The desired output is to take and print the average of a1 = c(2,3) in dt1, the average of a2 = (4,5,2) in dt1, the average of a1 = c(43) in dt2, the average of a2 = c(54,4) in dt2.
What am I doing wrong here? In general, how should I reference a variable inside of a for
loop (varname) that is pieced together by using the looping index (v) and something else?
Upvotes: 2
Views: 2295
Reputation: 462
I figured out a solution based on the comments of @Amar and @Scott Richie
for (d in ind) {
df <- get(paste0('dt', d, sep=''))
for (v in ind) {
varname <- paste0('a', v, sep='')
df1 <- df[eval(as.name(varname))!=1, .(mean =
mean(eval(as.name(varname))))]
print(df1)
}
}
Thanks EVERYONE!
Upvotes: 1
Reputation: 18
It is not very clear what you are trying to do, but if you want to replace all of the rows in the dataframe with the mean of the previous data frame's columns, I would suggest using a dataframe type instead as it is easier to index. Here is code that should work:
dt1 <- data.frame(a1 = c(1,2,3), a2 = c(4,5,2))
dt2 <- data.frame(a1 = c(1,43,1), a2 = c(52,4,1))
dtname = 'dt'
ind = c('1', '2')
for (d in ind){
df <- get(paste0('dt', d, sep=''))
for (i in 1:nrow(df)){
for (j in 1:ncol(df)){
if (df[i,j] !=1){
df[,j]<- mean(df[,j])
}
}
print(df)
}
}
The reason your code was not working before was because the variables were being treated like strings, not actual variables. You can see this by printing the data type of variances:
dtname = 'dt'
ind = c('1', '2')
for (d in ind) {
df <- get(paste0('dt', d, sep=''))
for (v in ind) {
varname <- paste0('a', v, sep='')
print(class(varname))
}
}
Which just returns "character"
Another solution using variable names and the dataframe type would be to index the df as follows:
df[["varname"]]
Here are two helpful links for this kind of operation:
* link 1: How to find the mean of a column
* link 2: Data frames
Upvotes: 0
Reputation: 10543
For a purely data.table
way, I would combine the different data.tables
and compute the averages:
# Concatenate the data.tables:
all_dt <- rbind("dt1" = dt1, "dt2" = dt2, idcol = "origin")
all_dt
# origin a1 a2
# 1: dt1 1 4
# 2: dt1 2 5
# 3: dt1 3 2
# 4: dt2 1 52
# 5: dt2 43 4
# 6: dt2 1 1
# Melt so that "a1" and "a2" are labels in a group column:
all_dt <- melt(all_dt, id.vars="origin")
all_dt
# origin variable value
# 1: dt1 a1 1
# 2: dt1 a1 2
# 3: dt1 a1 3
# 4: dt2 a1 1
# 5: dt2 a1 43
# 6: dt2 a1 1
# 7: dt1 a2 4
# 8: dt1 a2 5
# 9: dt1 a2 2
# 10: dt2 a2 52
# 11: dt2 a2 4
# 12: dt2 a2 1
# Compute averages by each data.table and column group, ignoring 1s:
all_dt[value != 1, .(mean = mean(value)), by = .(origin, variable)]
# origin variable mean
# 1: dt1 a1 2.500000
# 2: dt2 a1 43.000000
# 3: dt1 a2 3.666667
# 4: dt2 a2 28.000000
Upvotes: 3
Reputation: 23797
Would go for a vectorised approach. You are using R!
One possible way:
require(dplyr)
dt1[dt1==1] <- NA #replace 1 with NA
dt1 %>% summarise_all(mean, na.rm = TRUE) #mean of all columns.
a1 a2
1 2.5 3.666667
Upvotes: 0