user227710
user227710

Reputation: 3194

Cbind two data but with some modifications using data.table

I have two datasets as follows:

data1<-structure(list(gear = c(3, 3, 3, 3, 5, 3, 3, 3, 4, 4, 3, 3, 3, 
3, 3, 5, 5, 5), carb = c(2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 
4, 4, 4, 4, 6, 8)), .Names = c("gear", "carb"), class = "data.frame", row.names = c(NA, 
-18L))

data1
   gear carb
1     3    2
2     3    2
3     3    2
4     3    2
5     5    2
6     3    3
7     3    3
8     3    3
9     4    4
10    4    4
11    3    4
12    3    4
13    3    4
14    3    4
15    3    4
16    5    4
17    5    6
18    5    8

data2<-structure(list(carb = c(1, 2, 2, 2, 2, 2, 3, 4, 4, 4, 4, 4, 4, 
4, 4, 6, 8), fac = c(1L, 1L, 2L, 3L, 4L, 5L, 1L, 1L, 2L, 3L, 
4L, 5L, 6L, 7L, 8L, 1L, 1L), hello = c(NA, 0.292389553859123, 
0.584779107718246, 0.804071273112588, 0.804071273112588, 0.402035636556294, 
NA, 0.460230801434478, 1.25285051501608, 1.15057700358619, 0.869324847154013, 
0.818188091439071, 0.894893225011484, 0.792619713581601, 0.51136755714942, 
NA, NA), hello2 = c(NA, 5L, 5L, 5L, 5L, 4L, NA, 1L, 1L, 2L, 2L, 
1L, 1L, 2L, 2L, NA, NA)), row.names = c(NA, -17L), class = "data.frame", .Names = c("carb", 
"fac", "hello", "hello2"))

   carb fac     hello hello2
1     1   1        NA     NA
2     2   1 0.2923896      5
3     2   2 0.5847791      5
4     2   3 0.8040713      5
5     2   4 0.8040713      5
6     2   5 0.4020356      4
7     3   1        NA     NA
8     4   1 0.4602308      1
9     4   2 1.2528505      1
10    4   3 1.1505770      2
11    4   4 0.8693248      2
12    4   5 0.8181881      1
13    4   6 0.8948932      1
14    4   7 0.7926197      2
15    4   8 0.5113676      2
16    6   1        NA     NA
17    8   1        NA     NA

Assume data1 is the main data. I want to cbind data1 and data2 (NO MERGE) . But, as you can see they don't have the same number of rows. One way, I am trying to achieve this is using common var carb. If category of carb is in data 2 but not in data1, I don't want to cbind that category from data2. For example, in the above data, carb with value 1 is in data2 but not in data1, so ignore this while cbinding. If the number of rows for categories that exist in both dataset are not the same, I will use the number of rows for each category from data1. For example, for carb value =3, the number of row in data 1 is 3 whereas in data2 it is 1. So, I need to have 3 rows for carb 3 in data 2 before cbind. The additional two rows should just replicate the row that is in data1. My desired output ( The order needs to remain intact as in data1):

     +----------------------------------------+
     | gear   carb   fac       hello   hello2 |
     |----------------------------------------|
  1. |    3      2     1   0.2923896        5 |
  2. |    3      2     2   0.5847791        5 |
  3. |    3      2     3   0.8040713        5 |
  4. |    3      2     4   0.8040713        5 |
  5. |    5      2     5   0.4020356        4 |
     |----------------------------------------|
  6. |    3      3     1          NA       NA |
  7. |    3      3     1          NA       NA |
  8. |    3      3     1          NA       NA |
  9. |    4      4     1   0.4602308        1 |
 10. |    4      4     2   1.2528505        1 |
     |----------------------------------------|
 11. |    3      4     3    1.150577        2 |
 12. |    3      4     4   0.8693248        2 |
 13. |    3      4     5   0.8181881        1 |
 14. |    3      4     6   0.8948932        1 |
 15. |    3      4     7   0.7926197        2 |
     |----------------------------------------|
 16. |    5      4     8   0.5113676        2 |
 17. |    5      6     1          NA       NA |
 18. |    5      8     1          NA       NA |
     +----------------------------------------+

I was wondering whether there is some specific function in data.table package that does this kind.

Upvotes: 3

Views: 187

Answers (2)

Steven Beaupr&#233;
Steven Beaupr&#233;

Reputation: 21621

Using the same idea as @Frank answer (adding fac to data1):

library(dplyr)

data1 %>%
  group_by(carb) %>%
  mutate(fac = row_number()) %>%
  left_join(., data2)

Which gives:

#Source: local data frame [18 x 5]
#Groups: carb
#
#   gear carb fac     hello hello2
#1     3    2   1 0.2923896      5
#2     3    2   2 0.5847791      5
#3     3    2   3 0.8040713      5
#4     3    2   4 0.8040713      5
#5     5    2   5 0.4020356      4
#6     3    3   1        NA     NA
#7     3    3   2        NA     NA
#8     3    3   3        NA     NA
#9     4    4   1 0.4602308      1
#10    4    4   2 1.2528505      1
#11    3    4   3 1.1505770      2
#12    3    4   4 0.8693248      2
#13    3    4   5 0.8181881      1
#14    3    4   6 0.8948932      1
#15    3    4   7 0.7926197      2
#16    5    4   8 0.5113676      2
#17    5    6   1        NA     NA
#18    5    8   1        NA     NA

Upvotes: 2

Frank
Frank

Reputation: 66819

I think you actually do want a merge:

setDT(data1)
setDT(data2)

data1[,fac:=1:.N,by=carb]

setkey(data1,carb,fac)
setkey(data2,carb,fac)
data2[data1]

which gives

    carb fac     hello hello2 gear
 1:    2   1 0.2923896      5    3
 2:    2   2 0.5847791      5    3
 3:    2   3 0.8040713      5    3
 4:    2   4 0.8040713      5    3
 5:    2   5 0.4020356      4    5
 6:    3   1        NA     NA    3
 7:    3   2        NA     NA    3
 8:    3   3        NA     NA    3
 9:    4   1 0.4602308      1    4
10:    4   2 1.2528505      1    4
11:    4   3 1.1505770      2    3
12:    4   4 0.8693248      2    3
13:    4   5 0.8181881      1    3
14:    4   6 0.8948932      1    3
15:    4   7 0.7926197      2    3
16:    4   8 0.5113676      2    5
17:    6   1        NA     NA    5
18:    8   1        NA     NA    5

Upvotes: 5

Related Questions