Mike
Mike

Reputation: 1141

Find repeated elements in a list and remove those objects

I've got a long list, each object of which is itself a list containing headers and data. Some of the objects are repeated. I'd like to find the repeated objects and remove them.

Ideally this would find objects that are identical (name and contents). If both the name and contents are identical then the repeat is removed. If the name is the same, but the contents are different, then the object is renamed.

Alternatively I'd settle for finding names that are repeated and removing the objects without checking their content.

Here's a simplified example

my.list <- list(sample1 = list(header = c("a","b","c","k"),
                               data = c("a","b","c","k")),
                sample2 = list(header = c("d", "k", "x"),
                               data = c("d", "k", "x")),
                sample3 = list(header = c("z", "r", "v"),
                               data = c("z", "r", "v")),
                sample1 = list(header = c("a","b","c","k"),
                               data = c("a","b","c","k")),
                sample2 = list(header = c("h", "j", "l"),
                               data = c("h", "j", "l")))

table(names(my.list))

sample1 sample2 sample3 
      2       2       1 

In the above example, the second sample1 would be removed, but the second sample2 would be renamed, e.g. sample2_2.

I've read around, but can't find an example which uses objects that are themselves lists. The other solutions don't seem to cover it, e.g. Remove duplicate in a large list while keeping the named number in R

Upvotes: 1

Views: 56

Answers (2)

Kyle Kimler
Kyle Kimler

Reputation: 46

I would convert it to a data.frame with

do.call(rbind, unname(my.list)) %>% data.frame

then we can find the distinct elements with dplyr::distinct

do.call(rbind, unname(my.list)) %>% data.frame %>% distinct

Upvotes: 0

Dubukay
Dubukay

Reputation: 2071

This is relatively simple to do in two steps, but I'm not sure it can be done in one. The first step is removing exact duplicates (with duplicated) and the second one is name repair (with make.names):

my.list <- list(sample1 = list(header = c("a","b","c","k"),
                               data = c("a","b","c","k")),
                sample2 = list(header = c("d", "k", "x"),
                               data = c("d", "k", "x")),
                sample3 = list(header = c("z", "r", "v"),
                               data = c("z", "r", "v")),
                sample1 = list(header = c("a","b","c","k"),
                               data = c("a","b","c","k")),
                sample2 = list(header = c("h", "j", "l"),
                               data = c("h", "j", "l")))

my.list.dedup <- my.list[!duplicated(my.list)]
names(my.list.dedup) <- make.names(names(my.list.dedup), unique = TRUE)

which returns

list(
  sample1 = list(
    header = c("a", "b", "c", "k"),
    data = c("a", "b", "c", "k")
  ),
  sample2 = list(
    header = c("d", "k", "x"),
    data = c("d", "k", "x")
  ),
  sample3 = list(
    header = c("z", "r", "v"),
    data = c("z", "r", "v")
  ),
  sample2.1 = list(
    header = c("h", "j", "l"),
    data = c("h", "j", "l")
  )
)

Upvotes: 2

Related Questions