Reputation: 49
I have the following two datasets (data1 and data2) and would like to match based on the match variable across all possible cases.
data1 <- data.frame(
match = c(rep("a",7),rep("b",7),rep("c",3),rep("d",2))
)
data2 <- data.frame(
match = c(rep("a",4),rep("b",5),rep("c",2),rep("d",9)),
unit1 =
c(300,200,300,600,250,100,90,50,10,9,9.5,80,90,50,20,30,40,70,15,190)
)
in order to create a single dataset of the following format: Also of note is that these datasets are large so will need an efficient way of doing such matching.
match unit1_1 unit1_2 unit1_3 unit1_4 unit1_5 unit1_6 unit1_7 unit1_8 unit1_9
a 300 200 300 600
a 300 200 300 600
a 300 200 300 600
a 300 200 300 600
a 300 200 300 600
a 300 200 300 600
a 300 200 300 600
b 250 100 90 50 10
b 250 100 90 50 10
b 250 100 90 50 10
b 250 100 90 50 10
b 250 100 90 50 10
b 250 100 90 50 10
b 250 100 90 50 10
c 9 9.5
c 9 9.5
c 9 9.5
d 80 90 50 20 30 40 70 15 190
d 80 90 50 20 30 40 70 15 190
Upvotes: 0
Views: 101
Reputation: 5580
You can do this in a few ways, here's one using data.table
functions:
library( data.table )
setDT( data2 )
setDT( data1 )
Add a column to help casting to wide format, and to set the column names the way you want them.
data2[ , record := paste0( "unit1_", seq_len( .N ) ), by = match ]
Convert from long to wide format.
data3 <- dcast( data2, match ~ record, value.var = "unit1", fill = NA_real_ )
Now use the match
column to merge that with your initial data1
set
data4 <- merge( data1, data3, by = "match", all = TRUE )
data4
# match unit1_1 unit1_2 unit1_3 unit1_4 unit1_5 unit1_6 unit1_7 unit1_8 unit1_9
# 1: a 300 200.0 300 600 NA NA NA NA NA
# 2: a 300 200.0 300 600 NA NA NA NA NA
# 3: a 300 200.0 300 600 NA NA NA NA NA
# 4: a 300 200.0 300 600 NA NA NA NA NA
# 5: a 300 200.0 300 600 NA NA NA NA NA
# 6: a 300 200.0 300 600 NA NA NA NA NA
# 7: a 300 200.0 300 600 NA NA NA NA NA
# 8: b 250 100.0 90 50 10 NA NA NA NA
# 9: b 250 100.0 90 50 10 NA NA NA NA
# 10: b 250 100.0 90 50 10 NA NA NA NA
# 11: b 250 100.0 90 50 10 NA NA NA NA
# 12: b 250 100.0 90 50 10 NA NA NA NA
# 13: b 250 100.0 90 50 10 NA NA NA NA
# 14: b 250 100.0 90 50 10 NA NA NA NA
# 15: c 9 9.5 NA NA NA NA NA NA NA
# 16: c 9 9.5 NA NA NA NA NA NA NA
# 17: c 9 9.5 NA NA NA NA NA NA NA
# 18: d 80 90.0 50 20 30 40 70 15 190
# 19: d 80 90.0 50 20 30 40 70 15 190
Upvotes: 1