BioChemoinformatics
BioChemoinformatics

Reputation: 715

effectively add one data.table length according another table condition by matching corresponding column

I have two data.table object:

dt <- data.table(
  a = c(1, 1, 2), 
  s = c(1, 2, 5), 
  o = c("A", "A", "A")
)

dt2 <- data.table(
  a = c(1, 1, 2, 2),
  g = c(1, 2, 3, 4)  
)  

There are two elements of g for each a in dt2. I want to match a in dt with the a in dt2. Then in dt, each row will duplicate 2 times (because there are 2 element for each a in dt2. Certainly, it may be different number of elements for each a in dt2. That means the condition in dt2 determine how I append the corresponding g of dt2 into dt), and append the corresponding g of dt2 at the last column of dt. The final table should be like this:

fnl <- data.table(
  a = c(1, 1, 1, 1, 2, 2), 
  s = c(1, 1, 2, 2, 5, 5), 
  o = c("A", "A", "A", "A", "A", "A"),
  g = c(1, 2, 1, 2, 3, 4)
)

For clear, here is the two input tables and one output table.

> dt
   a s o
1: 1 1 A
2: 1 2 A
3: 2 5 A
> dt2
   a g
1: 1 1
2: 1 2
3: 2 3
4: 2 4
> fnl
   a s o g
1: 1 1 A 1
2: 1 1 A 2
3: 1 2 A 1
4: 1 2 A 2
5: 2 5 A 3
6: 2 5 A 4

I know that using for-loop I can do the task, but my data is too big, can you help figure out what is the effective way to do this?

Upvotes: 1

Views: 36

Answers (1)

Martin Gal
Martin Gal

Reputation: 16978

Since you are using data.tables it's pretty easy:

dt[dt2, on='a']

gives you

> dt[dt2, on='a']
   a s o g
1: 1 1 A 1
2: 1 2 A 1
3: 1 1 A 2
4: 1 2 A 2
5: 2 5 A 3
6: 2 5 A 4

Upvotes: 2

Related Questions