user10156381
user10156381

Reputation: 133

loop keeping variable names R

I would like to create a loop over 3 data frames and creates subsets of each and assign to these new subsets a new name. How can I loop over these three data frames while maintaining the names?

For example, I have 3 data frames: apples, berries, and grapes. When making a loop, is there a way to assign the new subset data frames similar names to their respective original data frame?

Written out without a loop, this is what the code would look like.

apples <- data.frame(type = c("red", "golden", "green"), number = c(1, 2, 3))
berries <- data.frame(type = c("blueberry", "raspberry", "mulberry"), number = c(1, 2, 3))
grapes <- data.frame(type = c("red", "green", "sour"), number = c(1, 2, 3))

apples_large <- subset(apples, number > 2)
apples_small <- subset(apples, number < 2)

berries_large <- subset(berries, number > 2)
berries_small <- subset(berries, number < 2)

grapes_large <- subset(grapes, number > 2)
grapes_small <- subset(grapes, number < 2) 

Upvotes: 4

Views: 465

Answers (3)

flies
flies

Reputation: 2135

First, put your data.frames into a list, then define a function that classifies the rows. Now you can split each element of the list according to your classifier in an lapply.

fruits <- list(
    apples=data.frame(type = c("red", "golden", "green"), number = c(1, 2, 3)),
    berries=data.frame(type = c("blueberry", "raspberry", "mulberry"), number = c(1, 2, 3)),
    grapes=data.frame(type = c("red", "green", "sour"), number = c(1, 2, 3))
)

clsfy <- function(num) {
    if (num>2) {
        ret <- "Large"
    } else if (num<2) {
        ret <- "Small"
    } else {
        ret <- NA ## if no condition is met, discard this row
    }
    return(ret)
}

fruits2 <- lapply(fruits, function(fr) {
    split(fr, sapply(fr$number, clsfy))
})

At this point, fruits2 looks like this:

>     fruits2
$apples
$apples$Large
   type number
3 green      3

$apples$Small
  type number
1  red      1


$berries
$berries$Large
      type number
3 mulberry      3

$berries$Small
       type number
1 blueberry      1


$grapes
$grapes$Large
  type number
3 sour      3

$grapes$Small
  type number
1  red      1

To generalize classifications using more than one column per row, you can use apply instead of sapply and re-define your clsfy function so that it takes the whole row: split(fr, apply(fr, 1, clsfy)). On the other hand, if your condition is really a simple binary, then ifelse is better than sapply(x$number, clsfy).

Upvotes: 1

akrun
akrun

Reputation: 887048

Place the dataset objects in a list and split by the 'number' column to get a nested list of datasets

lapply(list(apples, berries, grapes), function(x) split(x, x$number>2)) 

If we create a named list, then it becomes easier to identify or extract the individual components

out <- lapply(mget(c("apples", "berries", "grapes")),
  function(x) split(x, c("small", "large")[(x$number > 2) + 1]))
out$apples$small

As @JonMinton mentioned if we need to drop the rows that have 'number' 2

lapply(mget(c("apples", "berries", "grapes")),
       function(x) {x1 <- subset(x, number != 2)
             split(x1, c("small", "large")[(x1$number > 2) + 1])})   

Upvotes: 4

JonMinton
JonMinton

Reputation: 1279

It's a bad idea to create many objects in the global environment, rather than keeping them in a list, but this would do it:

tmp <- c("apples", "berries", "grapes")

for (i in 1:length(tmp)){
  assign(paste0("big_", tmp[i]), subset(get(tmp[i]), number > 2))
  assign(paste0("small_", tmp[i]), subset(get(tmp[i]), number < 2))
}

(or use seq_along(tmp) instead of 1:length(tmp))

Notice the use of assign for the outputs and get for the inputs.

Upvotes: 3

Related Questions