user1358425
user1358425

Reputation:

"Selective" join within data.frame?

I have two vectors of the same length. This is a simple example featuring four rows:

[1] green  
[2] black, yellow  
[3] orange, white, purple  
[4] NA  

[1] red  
[2] black  
[3] NA  
[4] blue  

There can be NAs in the first or second vectors but in each row at least one of them always has a value. The first vector can contain one or more values while the second can only have one. I want to "selectively" join these two vectors row by row in a way that the output will be this:

[1] green, red  
[2] black, yellow  
[3] orange, white, purple  
[4] blue  

This means the contents of the first vector always have to be present in the output. If there's a NA in a row of the first vector it'll be overwritten by the value in the same row of the second vector.
The content of the second vector will be added if this value isn't already in the same row of the first vector. NAs in the second vector will be ignored.

I tried:

merge(A,B)
merge(A, B, all=TRUE)
merge(A, B, all.x=TRUE)
merge(A, B, all.y=TRUE) 

But they all yield completely different results.

How can I achieve this "selective" join as described above?

Thank you very much in advance for your consideration!

Upvotes: 2

Views: 201

Answers (2)

thelatemail
thelatemail

Reputation: 93843

I'm not sure how you have this data input into a data.frame but if you put the data into 2 lists then I could see a method for doing it. Here is my attempt (with credit to the comment suggestions below):

# get the data
a <- c("green","black, yellow","orange, white, purple",NA)
b <- c("red","black",NA,"blue");

# strip any spaces first
a <- gsub("[[:space:]]+","",a)
b <- gsub("[[:space:]]+","",b)

# convert to lists
alist <- strsplit(a,",")
blist <- strsplit(b,",")

# join the lists
abjoin <- mapply(c,alist,blist)
# remove any duplicates and NA's
abjoin <- lapply(abjoin,function(x) (unique(x[complete.cases(x)])))

# result
> abjoin
[[1]]
[1] "green" "red"  

[[2]]
[1] "black"  "yellow"

[[3]]
[1] "orange" "white"  "purple"

[[4]]
[1] "blue"

And to convert into a vector with each colour set split by commas:

sapply(abjoin,paste,collapse=",")
#[1] "green,red"           "black,yellow"        "orange,white,purple"
#[4] "blue"  

Upvotes: 2

Prasad Chalasani
Prasad Chalasani

Reputation: 20282

You're essentially trying to do a "union, then throw away any NAs", so how about this one-liner?

A = list( 'green', c('black', 'yellow'), c('orange', 'white', 'purple'), NA)                                                             

B = list( 'red', 'black', NA, 'blue')    

> sapply(mapply(union, A,B), setdiff, NA)                                                                                               
 [[1]]                                                                                                                                   
 [1] "green" "red"                                                                                                                       

 [[2]]                                                                                                                                   
 [1] "black"  "yellow"                                                                                                                   

 [[3]]                                                                                                                                   
 [1] "orange" "white"  "purple"                                                                                                          

 [[4]]                                                                                                                                   
 [1] "blue"     

Upvotes: 3

Related Questions