R: How do you subset all data-frames within a list?

Question

I have a list of data-frames called WaFramesCosts. I want to simply subset it to show specific columns so that I can then export them. I have tried:

    for (i in names(WaFramesCosts)) {
      WaFramesCosts[[i]][,c("Cost_Center","Domestic_Anytime_Min_Used","Department",
"Domestic_Anytime_Min_Used")]

    }

but it returns the error of

Error in `[.data.frame`(WaFramesCosts[[i]], , c("Cost_Center", "Department",  : 
  undefined columns selected

I also tried:

for (i in seq_along(WaFramesCosts)){
WaFramesCosts[[i]][ , -which(names(WaFramesCosts[[i]]) %in% c("Cost_Center","Domestic_Anytime_Min_Used","Department",
    "Domestic_Anytime_Min_Used"))]

but I get the same error. Can anyone see what I am doing wrong?

Side Note: For reference, I used this:

for (i in seq_along(WaFramesCosts)) {
    t <- WaFramesCosts[[i]][ , grepl( "Domestic" , names( WaFramesCosts[[i]] ) )] 
    q <- subset(WaFramesCosts[[i]], select = c("Cost_Center","Domestic_Anytime_Min_Used","Department","Domestic_Anytime_Min_Used"))  

    WaFramesCosts[[i]] <- merge(q,t)
  }

while attempting the same goal with a different approach and seemed to get closer.

r2evans · Accepted Answer

Welcome back, Kootseeahknee. You are still incorrectly assuming that the last command of a for loop is implicitly returned at the end. If you want that behavior, perhaps you want lapply:

myoutput <- lapply(names(WaFramesCosts)), function(i) {
  WaFramesCosts[[i]][,c("Cost_Center","Domestic_Anytime_Min_Used","Department","Domestic_Anytime_Min_Used")]
})

The undefined columns selected error tells me that your assumptions of the datasets are not correct: at least one is missing at least one of the columns. From your previous question (How to do a complex edit of columns of all data frames in a list?), I'm inferring that you want columns that match, not assuming that it is in everything. From that, you could/should be using grep or some variant:

myoutput <- lapply(names(WaFramesCosts)), function(i) {
  WaFramesCosts[[i]][,grep("(Cost_Center|Domestic_Anytime_Min_Used|Department)", 
                           colnames(WaFramesCosts)),drop=FALSE]
})

This will match column names that contain any of those strings. You can be a lot more precise by ensuring whole strings or start/end matches occur by using regular expressions. For instance, changing from (Cost|Dom) (anything that contains "Cost" or "Dom") to (^Cost|Dom) means anything that starts with "Cost" or contains "Dom"; similarly, (Cost|ment$) matches anything that contains "Cost" or ends with "ment". If, however, you always want exact matches and just need those that exist, then something like this will work:

myoutput <- lapply(names(WaFramesCosts)), function(i) {
  WaFramesCosts[[i]][,intersect(c("Cost_Center","Domestic_Anytime_Min_Used","Department"),
                                colnames(WaFramesCosts)),drop=FALSE]
})

Note, in that last example: notice the difference between mtcars[,2] (returns a vector) and mtcars[,2,drop=FALSE] (returns a data.frame with 1 column). Defensive programming, if you think it at all possible that your filtering will return a single-column, make sure you do not inadvertently convert to a vector by appending ,drop=FALSE to your bracket-subsetting.

R: How do you subset all data-frames within a list?

Answers (2)

Related Questions