Reputation: 133
I would like to create a loop over 3 data frames and creates subsets of each and assign to these new subsets a new name. How can I loop over these three data frames while maintaining the names?
For example, I have 3 data frames: apples, berries, and grapes. When making a loop, is there a way to assign the new subset data frames similar names to their respective original data frame?
Written out without a loop, this is what the code would look like.
apples <- data.frame(type = c("red", "golden", "green"), number = c(1, 2, 3))
berries <- data.frame(type = c("blueberry", "raspberry", "mulberry"), number = c(1, 2, 3))
grapes <- data.frame(type = c("red", "green", "sour"), number = c(1, 2, 3))
apples_large <- subset(apples, number > 2)
apples_small <- subset(apples, number < 2)
berries_large <- subset(berries, number > 2)
berries_small <- subset(berries, number < 2)
grapes_large <- subset(grapes, number > 2)
grapes_small <- subset(grapes, number < 2)
Upvotes: 4
Views: 465
Reputation: 2135
First, put your data.frames
into a list
, then define a function that classifies the rows. Now you can split
each element of the list according to your classifier in an lapply
.
fruits <- list(
apples=data.frame(type = c("red", "golden", "green"), number = c(1, 2, 3)),
berries=data.frame(type = c("blueberry", "raspberry", "mulberry"), number = c(1, 2, 3)),
grapes=data.frame(type = c("red", "green", "sour"), number = c(1, 2, 3))
)
clsfy <- function(num) {
if (num>2) {
ret <- "Large"
} else if (num<2) {
ret <- "Small"
} else {
ret <- NA ## if no condition is met, discard this row
}
return(ret)
}
fruits2 <- lapply(fruits, function(fr) {
split(fr, sapply(fr$number, clsfy))
})
At this point, fruits2 looks like this:
> fruits2
$apples
$apples$Large
type number
3 green 3
$apples$Small
type number
1 red 1
$berries
$berries$Large
type number
3 mulberry 3
$berries$Small
type number
1 blueberry 1
$grapes
$grapes$Large
type number
3 sour 3
$grapes$Small
type number
1 red 1
To generalize classifications using more than one column per row, you can use apply
instead of sapply
and re-define your clsfy
function so that it takes the whole row: split(fr, apply(fr, 1, clsfy))
. On the other hand, if your condition is really a simple binary, then ifelse
is better than sapply(x$number, clsfy)
.
Upvotes: 1
Reputation: 887048
Place the dataset objects in a list
and split
by the 'number' column to get a nested list
of datasets
lapply(list(apples, berries, grapes), function(x) split(x, x$number>2))
If we create a named list
, then it becomes easier to identify or extract the individual components
out <- lapply(mget(c("apples", "berries", "grapes")),
function(x) split(x, c("small", "large")[(x$number > 2) + 1]))
out$apples$small
As @JonMinton mentioned if we need to drop the rows that have 'number' 2
lapply(mget(c("apples", "berries", "grapes")),
function(x) {x1 <- subset(x, number != 2)
split(x1, c("small", "large")[(x1$number > 2) + 1])})
Upvotes: 4
Reputation: 1279
It's a bad idea to create many objects in the global environment, rather than keeping them in a list, but this would do it:
tmp <- c("apples", "berries", "grapes")
for (i in 1:length(tmp)){
assign(paste0("big_", tmp[i]), subset(get(tmp[i]), number > 2))
assign(paste0("small_", tmp[i]), subset(get(tmp[i]), number < 2))
}
(or use seq_along(tmp)
instead of 1:length(tmp)
)
Notice the use of assign
for the outputs and get
for the inputs.
Upvotes: 3