debster
debster

Reputation: 333

Sum of elements from two data.table with different length by matching row

I have two data.table of different length - dt1 and dt2:

> dt1
   Column_1 Column_2
1:        A        3
2:        B        2
3:        C        5
4:        D        1
5:        E        4


> dt2
   Column_1 Column_2
1:        A        2
2:        C        5
3:        D        4

I want to create another data.table with the sum of Column_2 by Column_1. The desired output should be:

> dt3
   Column_1 Column_2
1:        A        5
2:        B        2
3:        C       10
4:        D        5
5:        E        4

I have tried using this method:

dt3 = setDT(dt1)[Column_1 %in% dt1$Column_1, .(Column_2 = dt1[, "Column_2"] + dt2[, "Column_2"]), by = Column_1]

but due to the different length of the two data.table I get the error message:

Error in Ops.data.frame(dt1[, "Column_2"], dt2[, "Column_2"]) : 
  ‘+’ only defined for equally-sized data frames

Upvotes: 0

Views: 177

Answers (1)

y3kMyRon
y3kMyRon

Reputation: 71

What was stated in the comments (please credit @tic-tic-choc) the code could actually look like this:

library(dplyr)

dt3 <- dt1 %>%
  full_join(dt2, by = "Column_1", suffix = c("", "_dt2")) %>%
  mutate(across(where(is.numeric), ~ ifelse(is.na(.x), 0, .x)),
         Column_2 = Column_2 + Column_2_dt2) %>%
  select(-ends_with("_dt2"))

Why I use a full_join? - Perhaps you have data in one frame that isn't present in the other one and vice versa...

You could use left_join, right_join or inner_join as well - just carefully read their documentation.

Why did i do the mutate(accross(where(is.numeric), ~ code)))thing?

Becaus when you do 5 + NA you get NA - therefor all NAs in numeric columns have been replaced by 0.0.

Upvotes: 1

Related Questions