Reputation: 328
I have the following dataframes df1
and df2
(the true ones have around a million rows):
df1 <- data.frame(ID=c(23425, 84733, 49822, 39940), X=c(312,354,765,432))
df2 <- data.frame(ID=c(23425, 49822), Y=c(111,222))
And I want to add an additional column Z in dataFrame df1. Each time an ID from df1 match with some ID from df2, the corresponding Y value must be added to that third row. If there is no match, a zero must be added
The result must be this one:
df <- data.frame(ID=c(23425,84733, 49822, 39940), X=c(312,354,765,432), Z=c(111,0,222,0))
I stored the ID's from the second dataframe in a vector and used a loop, but it takes forever.
Upvotes: 1
Views: 91
Reputation: 887118
We can use data.table
to do a join and replace the NA with 0
library(data.table)
setDT(df1)[df2, Z := Y, on = .(ID)][is.na(Z), Z:= 0]
df1
# ID X Z
#1: 23425 312 111
#2: 84733 354 0
#3: 49822 765 222
#4: 39940 432 0
Upvotes: 1
Reputation: 377
You can accomplish this simply with merge. Merge will match the Y values in df2 by "ID" to df1. If you specify the "all = TRUE" argument, when no match in df2 is found for IDs in df1, the Y value will be NA.
df <- merge(df1, df2, by = "ID", all = TRUE) df ID X Y 1 23425 312 111 2 39940 432 NA 3 49822 765 222 4 84733 354 NA
df$Y <- ifelse(is.na(df$Y), 0, df$Y) df ID X Y 1 23425 312 111 2 39940 432 0 3 49822 765 222 4 84733 354 0
Upvotes: 0
Reputation: 11613
I believe what you want is a join:
library(dplyr)
df1 %>%
left_join(df2)
#> Joining, by = "ID"
#> # A tibble: 4 × 3
#> ID X Y
#> <dbl> <dbl> <dbl>
#> 1 23425 312 111
#> 2 84733 354 NA
#> 3 49822 765 222
#> 4 39940 432 NA
If you want it exactly the way you have it with a new column name and zeroes instead of NA
, you can add a few more lines:
library(tidyr)
df1 %>%
left_join(df2) %>%
rename(Z = Y) %>%
replace_na(replace = list(Z = 0))
#> Joining, by = "ID"
#> # A tibble: 4 × 3
#> ID X Z
#> <dbl> <dbl> <dbl>
#> 1 23425 312 111
#> 2 84733 354 0
#> 3 49822 765 222
#> 4 39940 432 0
Upvotes: 2