Carly
Carly

Reputation: 43

List inside of data frame cell, how to extract unique lists? R

I am trying to create a data frame in which some cells have a list of strings, while others have a single string. Ideally, from this data frame, I would then be able to extract all unique lists into a new list, vector, or one-row data frame. Any tips? Reprex below:

Data frame with some lists of strings within cells:

require(stringr)
    
table3 <- data.frame(U1 = I(list(c("b", "d"),
                                  c("d"),
                                  c(NA))),
                            U2 = I(list(c("a", "b", "d"),
                                        c("b"),
                                        c("b","d"))),
                                   U3 = I(list(c(99),
                                               c("a"),
                                               c("a"))),
                                          U4= I(list(c("a"),
                                                     c(NA),
                                                     c(NA))))
rownames(table3) <- c("C1", "C2", "C3")

What I want the output to look like:

table3.elem <- data.frame(C = I(list(99, "a", "b", "d", c("b","d"), c("a", "b", "d"))))

I'm trying to ultimately reproduce the calculations for Krippendorff's alpha for multi-valued data, published in Krippendorff & Cragg (2016). Unfortunately, now that Java is no longer a thing their downloadable program to calculate this version of Krippendorff's alpha doesn't work on my computer. So trying to create a version for R that at least I can use (and hopefully others too if I can get it working okay).

Thank you!

Upvotes: 0

Views: 1151

Answers (2)

GKi
GKi

Reputation: 39687

You can use unlist not recursive, make unique and remove the NA to to extract unique lists.

x <- unique(unlist(table3, FALSE))
x <- x[!is.na(x)]
x <- x[order(lengths(x), sapply(x, paste, collapse= ""))] #In case it should be ordered
data.frame(C = I(x))
#        C
#1      99
#2       a
#3       b
#4       d
#5    b, d
#6 a, b, d

Upvotes: 3

akrun
akrun

Reputation: 887251

An option is

  1. Convert the data.frame into a list -unclass
  2. Flatten the list (do.call + c)
  3. Get the unique list elements
  4. Filter out the list elements that are NA
  5. Create a data.frame with a list column
out <- data.frame(C = I(Filter(function(x) all(complete.cases(x)), 
          unique(do.call(c, unclass(table3))))))
out <-  out[order(lengths(out$C), !sapply(out$C, is.numeric), 
       sapply(out$C, head, 1)), , drop = FALSE]
row.names(out) <- NULL

-output

> out
        C
1      99
2       a
3       b
4       d
5    b, d
6 a, b, d
> str(out)
'data.frame':   6 obs. of  1 variable:
 $ C:List of 6
  ..$ : num 99
  ..$ : chr "a"
  ..$ : chr "b"
  ..$ : chr "d"
  ..$ : chr  "b" "d"
  ..$ : chr  "a" "b" "d"
  ..- attr(*, "class")= chr "AsIs"

Upvotes: 4

Related Questions