Rfanatic
Rfanatic

Reputation: 2282

Understanding how the merge functions works in R

Please see my sample data. The dim(df1)= (10,5) and dim(df2)=(10,5)I would like to merge the 2 data frames using this command line: merged.data <- merge(df1, df2, by="serial"). What I don't understand why the dimension of merged.data is (26,9)?

    df1<-structure(list(X = 1:10, serial = c(11051018L, 11051018L, 11090618L, 
11090618L, 11120502L, 11120502L, 11120502L, 11120502L, 11120506L, 
11120506L), grp = c(220L, 508L, 254L, 348L, 326L, 328L, 612L, 
614L, 320L, 680L), Start_End = c("t0830_0845, t1830_1845", "t0830_0845, t1845_1900", 
"t0900_0915, t1145_1200", "t0900_0915, t1300_1315", "t0715_0730, t1215_1230", 
"t1245_1300, t1745_1800", "t0830_0845, t1400_1415", "t1445_1500, t2000_2015", 
"t1300_1315, t1845_1900", "t0700_0715, t1345_1400"), Duration = c(41L, 
42L, 12L, 17L, 21L, 21L, 23L, 22L, 24L, 28L)), row.names = c(NA, 
10L), class = "data.frame")

 

 df2<-structure(list(X = 1:10, serial = c(11051018L, 11051018L, 11090618L, 
11090618L, 11120502L, 11120502L, 11120502L, 11120502L, 11120506L, 
11151207L), grp = c(248L, 562L, 276L, 382L, 358L, 360L, 682L, 
684L, 352L, 260L), Start_End = c("t0830_0845, t1730_1745", "t0900_0915, t1945_2000", 
"t0900_0915, t1445_1500", "t0900_0915, t1245_1300", "t0800_0815, t1215_1230", 
"t1245_1300, t1745_1800", "t0830_0845, t1245_1300", "t1400_1415, t1845_1900", 
"t1700_1715, t2145_2200", "t0900_0915, t1245_1300"), Duration = c(37L, 
44L, 24L, 16L, 18L, 21L, 18L, 20L, 20L, 16L)), row.names = c(NA, 
10L), class = "data.frame")

Upvotes: 1

Views: 36

Answers (1)

akrun
akrun

Reputation: 887118

We can create a list column for the duplicates in one of the datasets and then merge so that the information is not lost

mergedout <- merge(df1, aggregate(.~ serial, df2, I), by = 'serial')

Upvotes: 1

Related Questions