sharkey32
sharkey32

Reputation: 49

data matching all cases in R

I have the following two datasets (data1 and data2) and would like to match based on the match variable across all possible cases.

data1 <- data.frame(

match = c(rep("a",7),rep("b",7),rep("c",3),rep("d",2))

)

data2 <- data.frame(

match = c(rep("a",4),rep("b",5),rep("c",2),rep("d",9)),
unit1 = 
c(300,200,300,600,250,100,90,50,10,9,9.5,80,90,50,20,30,40,70,15,190)

)

in order to create a single dataset of the following format: Also of note is that these datasets are large so will need an efficient way of doing such matching.

  match unit1_1 unit1_2 unit1_3 unit1_4 unit1_5 unit1_6 unit1_7 unit1_8 unit1_9
    a   300 200 300 600                 
    a   300 200 300 600                 
    a   300 200 300 600                 
    a   300 200 300 600                 
    a   300 200 300 600                 
    a   300 200 300 600                 
    a   300 200 300 600                 
    b   250 100 90  50  10              
    b   250 100 90  50  10              
    b   250 100 90  50  10              
    b   250 100 90  50  10              
    b   250 100 90  50  10              
    b   250 100 90  50  10              
    b   250 100 90  50  10              
    c   9   9.5                         
    c   9   9.5                         
    c   9   9.5                         
    d   80  90  50  20  30  40  70  15  190
    d   80  90  50  20  30  40  70  15  190

Upvotes: 0

Views: 101

Answers (1)

rosscova
rosscova

Reputation: 5580

You can do this in a few ways, here's one using data.table functions:

library( data.table )
setDT( data2 )
setDT( data1 )

Add a column to help casting to wide format, and to set the column names the way you want them.

data2[ , record := paste0( "unit1_", seq_len( .N ) ), by = match ]

Convert from long to wide format.

data3 <- dcast( data2, match ~ record, value.var = "unit1", fill = NA_real_ )    

Now use the match column to merge that with your initial data1 set

data4 <- merge( data1, data3, by = "match", all = TRUE )

data4

#    match unit1_1 unit1_2 unit1_3 unit1_4 unit1_5 unit1_6 unit1_7 unit1_8 unit1_9
# 1:     a     300   200.0     300     600      NA      NA      NA      NA      NA
# 2:     a     300   200.0     300     600      NA      NA      NA      NA      NA
# 3:     a     300   200.0     300     600      NA      NA      NA      NA      NA
# 4:     a     300   200.0     300     600      NA      NA      NA      NA      NA
# 5:     a     300   200.0     300     600      NA      NA      NA      NA      NA
# 6:     a     300   200.0     300     600      NA      NA      NA      NA      NA
# 7:     a     300   200.0     300     600      NA      NA      NA      NA      NA
# 8:     b     250   100.0      90      50      10      NA      NA      NA      NA
# 9:     b     250   100.0      90      50      10      NA      NA      NA      NA
# 10:     b     250   100.0      90      50      10      NA      NA      NA      NA
# 11:     b     250   100.0      90      50      10      NA      NA      NA      NA
# 12:     b     250   100.0      90      50      10      NA      NA      NA      NA
# 13:     b     250   100.0      90      50      10      NA      NA      NA      NA
# 14:     b     250   100.0      90      50      10      NA      NA      NA      NA
# 15:     c       9     9.5      NA      NA      NA      NA      NA      NA      NA
# 16:     c       9     9.5      NA      NA      NA      NA      NA      NA      NA
# 17:     c       9     9.5      NA      NA      NA      NA      NA      NA      NA
# 18:     d      80    90.0      50      20      30      40      70      15     190
# 19:     d      80    90.0      50      20      30      40      70      15     190

Upvotes: 1

Related Questions