R starter
R starter

Reputation: 197

write a function to subset df based on multiple conditions

This is an example of my data: Original data has 20 cols and 1350 rows.

 a <- c("blue", "red", "green", "blue","cyan")
 b <- c("red","red","green","blue", "orange")
 data <- data.frame(a,b)

Following code is the code that works well. in this code, I purposed 1. subset df based on the conditions below 2. removed unused levels 3. and then result will turn in table with 2 by 2 dimenstion

 blue.red <- subset(data, col1 %in% c("blue", "red") & 
               col2 %in% c("blue", "red"))
 rem <- droplevels(blue.red)
 table(rem$col1, rem.col2)

Here I tried to write a function to achieve the same purpose as the code above.

 sub_fun <- function(data, i, j...){
   subs <-subset(data, col1 %in% c("i", "j") &
             col2 %in% c("i", "j"))
   rem <- droplevels(subs)
   return(table(rem$i, rem$j))
 }

 check <- sub_fun(data, "blue", "red")
 check1 <-sub_fun(data, "red", "green"

But output tablea are empty. How should I write a function to subset this data?

Upvotes: 0

Views: 64

Answers (1)

davide
davide

Reputation: 325

remove the inverted commas around i and j in your function body, otherwise it will keep only observation containing "i" or "j" in col1 and col2:

sub_fun <- function(data, i, j){
  subs <- subset(data, col1 %in% c(i, j) & col2 %in% c(i, j))
  rem <- droplevels(subs)
  # if you assume that only columns col1 & col2 are in data
  return(table(rem))
  # if you have more columns in data then:
  # return(table(rem[, c('col1', 'col2')]))
}

that should do the trick

edit: the error you are getting is due to the fact that you were trying to extract i and j from rem, where i = 'blue' and j = 'red' (it does not make sense since i and j are not colnames of rem).

Upvotes: 1

Related Questions