RTrain3k
RTrain3k

Reputation: 867

Function to iteratively create subsets of data frame

I am trying to develop a function that creates a list of data frame subsets from a user provided vector of columns and list of values within each column to subset by.

Example data frame:

df <- data.frame(var1 = rep(1:3, each = 5),
                 var2 = rep(4:6, each = 5), 
                 var3 = rep(7:9, each = 5))

Vector of columns to subset: cols.df <- c(1,2,3)

List of values within each column to subset by: rows.df <- list(c(1:3), c(4:6), c(7:9))

Function to iteratively create a list of subsets:

subsetfcn <- function(data, cols, rowslist){

  df <- data 
  listofdfs <- list() # create data.frame to contain subsets

  for(a in cols){
    for(rows in rowslist) {
      for(row in rows) {
        df <- df[df[ , a]==row, ]
        listofdfs[[row]] <- df
      }
    }
  }
  return(listofdfs)
}

results <- subsetfcn(df, cols.df, rows.df)

The expected output is a list of:

> df[df[ , 1]==1, ]
  var1 var2 var3
1    1    4    7
2    1    4    7
3    1    4    7
4    1    4    7
5    1    4    7
> df[df[ , 1]==2, ]
   var1 var2 var3
6     2    5    8
7     2    5    8
8     2    5    8
9     2    5    8
10    2    5    8
> df[df[ , 1]==3, ]
   var1 var2 var3
11    3    6    9
12    3    6    9
13    3    6    9
14    3    6    9
15    3    6    9
> 
> df[df[ , 2]==4, ]
  var1 var2 var3
1    1    4    7
2    1    4    7
3    1    4    7
4    1    4    7
5    1    4    7
> df[df[ , 2]==5, ]
   var1 var2 var3
6     2    5    8
7     2    5    8
8     2    5    8
9     2    5    8
10    2    5    8
> df[df[ , 2]==6, ]
   var1 var2 var3
11    3    6    9
12    3    6    9
13    3    6    9
14    3    6    9
15    3    6    9

etc....

As of now, the function returns a list of 9 data frames, but each has no rows. I'm not sure why the correct values are not being passed to a and row.

Upvotes: 2

Views: 209

Answers (1)

zx8754
zx8754

Reputation: 56159

Using mapply:

res <- unlist(
  mapply(function(cols.df, rows.df){
    lapply(rows.df, function(x){ df[ df[ , cols.df ] == x, ] })

  }, cols.df, rows.df, SIMPLIFY = FALSE),
  recursive = FALSE)


# check output
length(res)
# [1] 9

res[1:2]
# [[1]]
# var1 var2 var3
# 1    1    4    7
# 2    1    4    7
# 3    1    4    7
# 4    1    4    7
# 5    1    4    7
# 
# [[2]]
# var1 var2 var3
# 6     2    5    8
# 7     2    5    8
# 8     2    5    8
# 9     2    5    8
# 10    2    5    8

Upvotes: 2

Related Questions