PatraoPedro
PatraoPedro

Reputation: 197

Match columns between two different data.frames using three different fields

Here’s my problem which has been driving me crazy all morning.

So, I have two tables “vessel” and “target”

v_registry<-c("","GBR000B11824","GBR000B10110","GBR000C17779","","GBR000C16255")
   v_pln<-c("WH4","","BRD5","B291","LI8","UL78")
   v_rss<-c("C19926","B11824","","C17779","A16190","C16255")
   v_asset<- c(104892,104902,104905,104916,104919,104920)
   vessel<-data.frame(v_registry,v_pln,v_rss,v_asset,stringsAsFactors=FALSE)

   t_registry<-c("GBR000C19926","GBR000B11824","","","GBR000A16190","")
   t_pln<-c("","","BRD5","B291","LI8","")
   t_rss<-c("C19926","","","","","C16255")
   target<-data.frame(t_registry,t_pln,t_rss,stringsAsFactors=FALSE)


  target<-target[sample(nrow(target)),] 

The vessel table has ID information about vessels. The target table is quite wide with lots of other data not needed here in the example. What I’m trying to achieve is to copy across the “t_asset” column (which is the only complete ID field) to the target table. The problem is that none of my tables is complete and I need to make this operation based on three different fields.

Below are a couple of attempts to do it. The sample line is just to shuffle it because for some strange reason if it is ordered it will work. The second attempt returns only a logical value and I haven’t managed to get the element instead of the logical value.

 #Attempt 1
 target$t_asset<-
 vessel$v_asset[match(target$t_registry,vessel$v_registry,incomparables = "")|
                match(target$t_pln,vessel$v_pln,incomparables = "")|
                match(target$t_rss,vessel$v_rss,incomparables = "")]  

 #Attempt 2
 target$t_asset<-
 (vessel$v_asset[match(target$t_registry,vessel$v_registry,incomparables = "")]|
  vessel$v_asset[match(target$t_pln,vessel$v_pln,incomparables = "")]|
  vessel$v_asset[match(target$t_rss,vessel$v_rss,incomparables = "")])   

The expected output is(rows might look different because of the shuffle):

> target
    t_registry t_pln  t_rss t_asset
1 GBR000C19926       C19926  104892
2 GBR000B11824               104902
3               BRD5         104905
4               B291         104916
5 GBR000A16190   LI8         104919
6                    C16255  104920

Any ideas on how to solve it?

Cheers

Upvotes: 1

Views: 64

Answers (3)

PatraoPedro
PatraoPedro

Reputation: 197

Both, earlier answers solved the example given. However, when applied in the real data set for some reason both gave errors.

So, finally I've manged to get some code giving the right answers and tested in the real data set. However, the code is not pretty and I'm sure it could be made more efficient.

# Creates three new columns each with an idependent match
target$t_asset_registry<-vessel$v_asset[match(target$t_registry,vessel$v_registry,incomparables = "")] 
target$t_asset_pln<-vessel$v_asset[match(target$t_pln,vessel$v_pln,incomparables = "")]
target$t_asset_rss<-vessel$v_asset[match(target$t_rss,vessel$v_rss,incomparables = "")]    

# an if statment to sumarize the results  
target$asset<-ifelse(is.na(target$t_asset_registry),
                ifelse(is.na(target$t_asset_pln),
                  ifelse(is.na(target$t_asset_rss),NA,target$t_asset_rss),
                  target$t_asset_pln),target$t_asset_registry) 

The output is:

> target
    t_registry t_pln  t_rss t_asset_registry t_asset_pln t_asset_rss  asset
4               B291                      NA      104916          NA 104916
3               BRD5                      NA      104905          NA 104905
6                    C16255               NA          NA      104920 104920
5 GBR000A16190   LI8                      NA      104919          NA 104919
1 GBR000C19926       C19926               NA          NA      104892 104892
2 GBR000B11824                        104902          NA          NA 104902

Looking to the output it is clear what I was trying to achieve.If anyone as a clever way of accomplish the same result please post it.

Thanks to all helping out

Upvotes: 0

Zelazny7
Zelazny7

Reputation: 40648

Using merge:

target$t_asset <- merge(target, vessel, by.x=1:3, by.y=1:3, all.y = T, sort = F)$v_asset

> target
    t_registry t_pln  t_rss t_asset
6                    C16255  104892
1 GBR000C19926       C19926  104902
3               BRD5         104905
2 GBR000B11824               104916
5 GBR000A16190   LI8         104919
4               B291         104920

Upvotes: 1

Simon O&#39;Hanlon
Simon O&#39;Hanlon

Reputation: 60000

#  Find which rows from vessel are the match for target
x <- mapply( match , MoreArgs=list(incomparables="") , target , vessel )

#  Remove the NA's and incase you have more than one piece of information
#  available (multiple matches), reduce to a single number
idx <- apply(x,1, function(x) unique( x[!is.na(x) ] ))

#  Use the matches to get the id field from vessel
target$t_asset <- vessel$v_asset[idx]
target
#    t_registry t_pln  t_rss t_asset
#3               BRD5         104905
#2 GBR000B11824               104902
#4               B291         104916
#1 GBR000C19926       C19926  104892
#6                    C16255  104920
#5 GBR000A16190   LI8         104919

Upvotes: 1

Related Questions