Reputation: 333
I have two data.table
of different length - dt1
and dt2
:
> dt1
Column_1 Column_2
1: A 3
2: B 2
3: C 5
4: D 1
5: E 4
> dt2
Column_1 Column_2
1: A 2
2: C 5
3: D 4
I want to create another data.table
with the sum of Column_2
by Column_1
. The desired output should be:
> dt3
Column_1 Column_2
1: A 5
2: B 2
3: C 10
4: D 5
5: E 4
I have tried using this method:
dt3 = setDT(dt1)[Column_1 %in% dt1$Column_1, .(Column_2 = dt1[, "Column_2"] + dt2[, "Column_2"]), by = Column_1]
but due to the different length of the two data.table
I get the error message:
Error in Ops.data.frame(dt1[, "Column_2"], dt2[, "Column_2"]) :
‘+’ only defined for equally-sized data frames
Upvotes: 0
Views: 177
Reputation: 71
What was stated in the comments (please credit @tic-tic-choc) the code could actually look like this:
library(dplyr)
dt3 <- dt1 %>%
full_join(dt2, by = "Column_1", suffix = c("", "_dt2")) %>%
mutate(across(where(is.numeric), ~ ifelse(is.na(.x), 0, .x)),
Column_2 = Column_2 + Column_2_dt2) %>%
select(-ends_with("_dt2"))
Why I use a full_join? - Perhaps you have data in one frame that isn't present in the other one and vice versa...
You could use left_join, right_join or inner_join as well - just carefully read their documentation.
Why did i do the mutate(accross(where(is.numeric), ~ code)))
thing?
Becaus when you do 5 + NA you get NA - therefor all NAs in numeric columns have been replaced by 0.0.
Upvotes: 1