user0815
user0815

Reputation: 115

Combine list elements?

I've got two long lists A and B which have the same length but contain different numbers of equivalent elements:
List A can contain many elements which also can recur in the same field.
List B either contains only one element or an empty field, i.e. "character(0)".
A also contains some empty fields but for these records there's always an element present in B, so there are no records with empty fields in A and B.
I want to combine the elements of A and B into a new list of the same length, C, according to the following rules:

This is an example of how these lists begin:

> A  
 [1] "JAMES" "JAMES"  
 [2] "JOHN" "ROBERT"  
 [3] "WILLIAM" "MICHAEL" "WILLIAM" "DAVID" "WILLIAM"  
 [4] character(0)  
...  
> B  
 [1] "RICHARD"  
 [2] "JOHN"  
 [3] character(0)  
 [4] "CHARLES"  
...  

This is the correct output I'm looking for:

> C  
 [1] "JAMES" "JAMES" "RICHARD"  
 [2] "JOHN" "ROBERT"  
 [3] "WILLIAM" "MICHAEL" "WILLIAM" "DAVID" "WILLIAM"  
 [4] "CHARLES"  
... 

I tried, e.g.:

C <- sapply(mapply(union, A,B), setdiff, character(0))  

But this deleted the recurrences from A, unfortunately:

> C  
 [1] "JAMES" "RICHARD"  
 [2] "JOHN" "ROBERT"  
 [3] "WILLIAM" "MICHAEL" "DAVID"  
 [4] "CHARLES"  
...  

Can anybody tell me, please, how to combine these two lists, preserve the recurrences from A, and achieve the output I desire?

Thank you very much in advance!

Update: Machine readable data:

A <- list(c("JAMES","JAMES"),
          c("JOHN","ROBERT"), 
          c("WILLIAM","MICHAEL","WILLIAM","DAVID","WILLIAM"),  
          character(0))
B <- list("RICHARD","JOHN",character(0),"CHARLES")

Upvotes: 4

Views: 5469

Answers (1)

Gavin Simpson
Gavin Simpson

Reputation: 174788

Here is your snippte of data, in reproducible form:

A <- list(c("JAMES","JAMES"),
          c("JOHN","ROBERT"), 
          c("WILLIAM","MICHAEL","WILLIAM","DAVID","WILLIAM"),  
          character(0))
B <- list("RICHARD","JOHN",character(0),"CHARLES")

You were close with mapply(). I got the desired output by using c() to concatenate the list elements in A and B but had to manipulate elements of the supplied vectors, so I came up with this:

foo <- function(...) {
    l1 <- length(..1)
    l2 <- length(..2)
    out <- character(0)
    if(l1 > 0) {
        if(l2 > 0) {
            out <- if(..2 %in% ..1)
                ..1
            else
                c(..1, ..2)
        } else {
            out <-  ..1
        }
    } else {
        out <-  ..2
    }
    out
}

We can refer to the individual elements of ... using the ..n placeholders; ..1 is A and ..2 is B. Of course, foo() only works with two lists but doesn't enforce this or do any checking, just to keep things simple. foo() also needs to handle the cases where either A or B or both are character(0) which I now think foo() does.

When we use that in the mapply() call I get:

> mapply(foo, A, B)
[[1]]
[1] "JAMES"   "JAMES"   "RICHARD"

[[2]]
[1] "JOHN"   "ROBERT"

[[3]]
[1] "WILLIAM" "MICHAEL" "WILLIAM" "DAVID"   "WILLIAM"

[[4]]
[1] "CHARLES"

An lapply() version may be more meaningful than the abstract ..n but uses essentially the same code. Here is a new function that works with A and B directly but we iterate over the indices of the elements of A (1, 2, 3, length(A)) as generated by seq_along():

foo2 <- function(ind, A, B) {
    l1 <- length(A[[ind]])
    l2 <- length(B[[ind]])
    out <- character(0)
    if(l1 > 0) {
        if(l2 > 0) {
            out <- if(B[[ind]] %in% A[[ind]]) {
                A[[ind]]
            } else {
                c(A[[ind]], B[[ind]])
            }
        } else {
            out <- A[[ind]]
        }
    } else {
        out <- B[[ind]]
    }
    out
}

which is called like this:

> lapply(seq_along(A), foo2, A = A, B = B)
[[1]]
[1] "JAMES"   "JAMES"   "RICHARD"

[[2]]
[1] "JOHN"   "ROBERT"

[[3]]
[1] "WILLIAM" "MICHAEL" "WILLIAM" "DAVID"   "WILLIAM"

[[4]]
[1] "CHARLES"

Upvotes: 7

Related Questions