Combine list elements?

Question

I've got two long lists A and B which have the same length but contain different numbers of equivalent elements:
List A can contain many elements which also can recur in the same field.
List B either contains only one element or an empty field, i.e. "character(0)".
A also contains some empty fields but for these records there's always an element present in B, so there are no records with empty fields in A and B.
I want to combine the elements of A and B into a new list of the same length, C, according to the following rules:

All elements from A have to be present in C - including their potential recurrences in the same field.
If B contains an element which isn't already present in A of the same record it'll be added to C as well.
But if B contains an element which already is present in A of the same record it'll be ignored.
If A has an empty field the element from B for this record will be added to C.
If B has an empty field the element(s) from A for this record will be added to C.

This is an example of how these lists begin:

> A  
 [1] "JAMES" "JAMES"  
 [2] "JOHN" "ROBERT"  
 [3] "WILLIAM" "MICHAEL" "WILLIAM" "DAVID" "WILLIAM"  
 [4] character(0)  
...  
> B  
 [1] "RICHARD"  
 [2] "JOHN"  
 [3] character(0)  
 [4] "CHARLES"  
...

This is the correct output I'm looking for:

> C  
 [1] "JAMES" "JAMES" "RICHARD"  
 [2] "JOHN" "ROBERT"  
 [3] "WILLIAM" "MICHAEL" "WILLIAM" "DAVID" "WILLIAM"  
 [4] "CHARLES"  
...

I tried, e.g.:

C <- sapply(mapply(union, A,B), setdiff, character(0))

But this deleted the recurrences from A, unfortunately:

> C  
 [1] "JAMES" "RICHARD"  
 [2] "JOHN" "ROBERT"  
 [3] "WILLIAM" "MICHAEL" "DAVID"  
 [4] "CHARLES"  
...

Can anybody tell me, please, how to combine these two lists, preserve the recurrences from A, and achieve the output I desire?

Thank you very much in advance!

Update: Machine readable data:

A <- list(c("JAMES","JAMES"),
          c("JOHN","ROBERT"), 
          c("WILLIAM","MICHAEL","WILLIAM","DAVID","WILLIAM"),  
          character(0))
B <- list("RICHARD","JOHN",character(0),"CHARLES")

Gavin Simpson · Accepted Answer

Here is your snippte of data, in reproducible form:

A <- list(c("JAMES","JAMES"),
          c("JOHN","ROBERT"), 
          c("WILLIAM","MICHAEL","WILLIAM","DAVID","WILLIAM"),  
          character(0))
B <- list("RICHARD","JOHN",character(0),"CHARLES")

You were close with mapply(). I got the desired output by using c() to concatenate the list elements in A and B but had to manipulate elements of the supplied vectors, so I came up with this:

foo <- function(...) {
    l1 <- length(..1)
    l2 <- length(..2)
    out <- character(0)
    if(l1 > 0) {
        if(l2 > 0) {
            out <- if(..2 %in% ..1)
                ..1
            else
                c(..1, ..2)
        } else {
            out <-  ..1
        }
    } else {
        out <-  ..2
    }
    out
}

We can refer to the individual elements of ... using the ..n placeholders; ..1 is A and ..2 is B. Of course, foo() only works with two lists but doesn't enforce this or do any checking, just to keep things simple. foo() also needs to handle the cases where either A or B or both are character(0) which I now think foo() does.

When we use that in the mapply() call I get:

> mapply(foo, A, B)
[[1]]
[1] "JAMES"   "JAMES"   "RICHARD"

[[2]]
[1] "JOHN"   "ROBERT"

[[3]]
[1] "WILLIAM" "MICHAEL" "WILLIAM" "DAVID"   "WILLIAM"

[[4]]
[1] "CHARLES"

An lapply() version may be more meaningful than the abstract ..n but uses essentially the same code. Here is a new function that works with A and B directly but we iterate over the indices of the elements of A (1, 2, 3, length(A)) as generated by seq_along():

foo2 <- function(ind, A, B) {
    l1 <- length(A[[ind]])
    l2 <- length(B[[ind]])
    out <- character(0)
    if(l1 > 0) {
        if(l2 > 0) {
            out <- if(B[[ind]] %in% A[[ind]]) {
                A[[ind]]
            } else {
                c(A[[ind]], B[[ind]])
            }
        } else {
            out <- A[[ind]]
        }
    } else {
        out <- B[[ind]]
    }
    out
}

which is called like this:

> lapply(seq_along(A), foo2, A = A, B = B)
[[1]]
[1] "JAMES"   "JAMES"   "RICHARD"

[[2]]
[1] "JOHN"   "ROBERT"

[[3]]
[1] "WILLIAM" "MICHAEL" "WILLIAM" "DAVID"   "WILLIAM"

[[4]]
[1] "CHARLES"

Combine list elements?

Answers (1)

Related Questions