Remove common elements in Data Frame

Question

Starting a separate thread as question is slightly different now(R: split data frame rows by space, remove common elements, put unequal length columns in new df). I have a data frame with arbitrary number of columns, and want to remove ALL elements that are not unique to any of the columns. Suggestion was to use intersect but it only removes elements that are present in all columns (see below). I need to remove any element that is seen in more than 1 column. And need a vectorized solution - as right now I can do it but really tediously working with N vectors. Thanks!

This one does the job but only for element that is seen in every column:

df1 = structure(list(A = structure(1:3, .Label = c("R1", "R2", "R3"), class = "factor"), 
                    B = c("1 4 78 5 4 6 7 0", 
                          "2 3 76 8 2 1 8 0", 
                          "4 7 1 2"
                    )), .Names = c("A", "B"), row.names = c(NA, -3L), class = "data.frame")


s <- strsplit(df1$B, " ")
## find the intersection of all s
r <- Reduce(intersect, s)
## iterate over s, removing the intersection characters in r
l <- lapply(s, function(x) x[!x %in% r])
## reset the length of each vector in l to the length of the longest vector
## then create the new data frame
zz = setNames(as.data.frame(lapply(l, "length<-", max(sapply(l, length)))), letters[seq_along(l)])

Edit. My apologies - should have included desired output. Here it is:

Col1 Col2 Col3 
78 3 NA
5  76 NA
6  8 NA
NA 8 NA

Rorschach · Accepted Answer

You can make a table of unique values from each list and remove those with counts greater than 1.

tab <- table(unlist(sapply(s, unique))) < 2
lapply(s, function(x) x[tab[x]])

Remove common elements in Data Frame

Answers (2)

Related Questions