how do you subset a data frame based on a variable name

Question

my data frame called d:

 dput(d)
structure(list(Hostname = structure(c(8L, 8L, 9L, 5L, 6L, 7L, 
1L, 2L, 3L, 4L), .Label = c("db01", "db02", "farm01", "farm02", 
"tom01", "tom02", "tom03", "web01", "web03"), class = "factor"), 
    Date = structure(c(6L, 10L, 5L, 3L, 2L, 1L, 8L, 9L, 7L, 4L
    ), .Label = c("10/5/2015 1:15", "10/5/2015 1:30", "10/5/2015 2:15", 
    "10/5/2015 4:30", "10/5/2015 8:30", "10/5/2015 8:45", "10/6/2015 8:15", 
    "10/6/2015 8:30", "9/11/2015 5:00", "9/11/2015 6:00"), class = "factor"), 
    Cpubusy = c(31L, 20L, 30L, 20L, 18L, 20L, 41L, 21L, 29L, 
    24L), UsedPercentMemory = c(99L, 98L, 95L, 99L, 99L, 99L, 
    99L, 98L, 63L, 99L)), .Names = c("Hostname", "Date", "Cpubusy", 
"UsedPercentMemory"), class = "data.frame", row.names = c(NA, 
-10L))

In a loop I need to go through this data frame based on metrics variable, I need to createa subset data frame for summarization:

metrics<-as.vector(unique(colnames(d[,c(3:4)])))

for (m in metrics){
    sub<-dd[,c(1,m)]
}

I cannot use m in this subset line, any ideas how I could subset data frame based on a variable name?

kliron · Accepted Answer

In your subsetting call you are mixing column indexes and column names so R does not understand what you are trying to do.

Either use column names:

for (m in metrics) {
  sub <- d[, c(colnames(d)[1], m)]
}

Or indexes:

for (i in 3:4) {
   sub <- d[, c(1, i)]
}

Having said that, for loops in R are usually for cases where dynamic assignments are needed or for calling functions with side effects or some other relatively unusual case. Creating a summary by slicing and dicing data in for loops is almost never the proper way to do it in R. If the usual functional tools are not enough there are fantastic packages like plyr, dplyr, etc that let you split-apply-combine your data in very convenient and idiomatic ways.

how do you subset a data frame based on a variable name

Answers (1)

Related Questions