jyson
jyson

Reputation: 285

unexpected error when taking complementary set for data.frame in the list

I have data.frame object in the list and I intend to do setdiff for data.frame objects conditionally. I also come up very sketch function to do this task, but I got an error for taking complementary set of data.frame. In particular, I want to take corresponding data.frame depends on the condition. Can anyone propose me any idea to solve this issue efficiently ? How can I accomplish this task ?

mini example:

myList <- list(
  saved = data.frame(from=c(3,33,54,91), to=c(23,42,71,107), label=c("a1","a4","a7","a11"), SC=c(22,6,13,7)),
  droped = data.frame(from=c(25,33,47,74,91), to=c(29,42,51,81,107), label=c("a2","a4","a6","a8","a11"), SC=c(3,6,4,5,7))
)

based on input, I desire to implement this function (just sketch):

library(dplyr)
func <- function(list, type=c("Bio", "Tech")) {
  type=match.arg(type)
  res <- ifelse(type=="Bio",
                res <- list[[1]],
                res <- setdiff(list[[1]], list[[2]]))
  return(res)
}

I got an error like this:

Error: not compatible: Factor levels not equal for column label

my desired output would be :

if type is "Bio" :

  from  to label SC
1    3  23    a1 22
2   33  42    a4  6
3   54  71    a7 13
4   91 107   a11  7

if type is "Tech" :

 from  to label SC
1    3  23    a1 22
3   54  71    a7 13

Can anyone point me out how to fix this problem ? How can I get my expected output more efficiently? Thanks a lot.

Upvotes: 0

Views: 200

Answers (1)

aichao
aichao

Reputation: 7435

The issue is that the label column in each of your data frames is a factor and not just characters. To get what you want:

myList <- list(
  saved = data.frame(from=c(3,33,54,91), to=c(23,42,71,107), label=c("a1","a4","a7","a11"), SC=c(22,6,13,7), stringsAsFactors=FALSE),
  droped = data.frame(from=c(25,33,47,74,91), to=c(29,42,51,81,107), label=c("a2","a4","a6","a8","a11"), SC=c(3,6,4,5,7), stringsAsFactors=FALSE)
)

func <- function(list, type=c("Bio", "Tech")) {
  type=match.arg(type)
  if(type=="Bio") list[[1]] else setdiff(list[[1]], list[[2]])
}

Notes:

  1. Use StringsAsFactors=FALSE in constructing your data frames.

  2. The other issue has to do with your definition of func. Using ifelse on a scalar comparison of type will only return you the first column for your result. So, use if-else instead in your func.

With this:

func(myList,"Bio")
##  from  to label SC
##1    3  23    a1 22
##2   33  42    a4  6
##3   54  71    a7 13
##4   91 107   a11  7
func(myList,"Tech")
##  from to label SC
##1    3 23    a1 22
##2   54 71    a7 13

If you do want to keep the label columns as factors, then you need to set the levels of these factors to be the union of the individual factor levels:

## This time with stringsAsFactors=TRUE
myList <- list(
  saved = data.frame(from=c(3,33,54,91), to=c(23,42,71,107), label=c("a1","a4","a7","a11"), SC=c(22,6,13,7), stringsAsFactors=TRUE),
  droped = data.frame(from=c(25,33,47,74,91), to=c(29,42,51,81,107), label=c("a2","a4","a6","a8","a11"), SC=c(3,6,4,5,7), stringsAsFactors=TRUE)
)

myLevels <- unique(c(levels(myList[[1]]$label),levels(myList[[2]]$label)))
##[1] "a1"  "a11" "a4"  "a7"  "a2"  "a6"  "a8" 
myList[[1]]$label <- factor(myList[[1]]$label,levels=myLevels)
myList[[2]]$label <- factor(myList[[2]]$label,levels=myLevels)

Then the above func will work as before.

Upvotes: 1

Related Questions