Rodrigo Guinea
Rodrigo Guinea

Reputation: 328

how do I add a third column in a first dataframe and place the values from some column of a second dataframe whose ID matches the one in the first DF?

I have the following dataframes df1 and df2 (the true ones have around a million rows):

df1 <- data.frame(ID=c(23425, 84733, 49822, 39940), X=c(312,354,765,432))
df2 <- data.frame(ID=c(23425, 49822), Y=c(111,222))

And I want to add an additional column Z in dataFrame df1. Each time an ID from df1 match with some ID from df2, the corresponding Y value must be added to that third row. If there is no match, a zero must be added

The result must be this one:

df <- data.frame(ID=c(23425,84733, 49822, 39940), X=c(312,354,765,432), Z=c(111,0,222,0))

I stored the ID's from the second dataframe in a vector and used a loop, but it takes forever.

Upvotes: 1

Views: 91

Answers (3)

akrun
akrun

Reputation: 887118

We can use data.table to do a join and replace the NA with 0

library(data.table)
setDT(df1)[df2, Z := Y, on = .(ID)][is.na(Z), Z:= 0]
df1
#     ID   X   Z
#1: 23425 312 111
#2: 84733 354   0
#3: 49822 765 222
#4: 39940 432   0

Upvotes: 1

griffmer
griffmer

Reputation: 377

You can accomplish this simply with merge. Merge will match the Y values in df2 by "ID" to df1. If you specify the "all = TRUE" argument, when no match in df2 is found for IDs in df1, the Y value will be NA.

merge the 2 datasets by ID and keep all values in each dataset.

df <- merge(df1, df2, by = "ID", all = TRUE) df ID X Y 1 23425 312 111 2 39940 432 NA 3 49822 765 222 4 84733 354 NA

If you want no match to be specified by 0 instead of NA, just replace that value in the Y column.

df$Y <- ifelse(is.na(df$Y), 0, df$Y) df ID X Y 1 23425 312 111 2 39940 432 0 3 49822 765 222 4 84733 354 0

Upvotes: 0

Julia Silge
Julia Silge

Reputation: 11613

I believe what you want is a join:

library(dplyr)

df1 %>% 
    left_join(df2)

#> Joining, by = "ID"
#> # A tibble: 4 × 3
#>      ID     X     Y
#>   <dbl> <dbl> <dbl>
#> 1 23425   312   111
#> 2 84733   354    NA
#> 3 49822   765   222
#> 4 39940   432    NA

If you want it exactly the way you have it with a new column name and zeroes instead of NA, you can add a few more lines:

library(tidyr)
df1 %>%
    left_join(df2) %>%
    rename(Z = Y) %>%
    replace_na(replace = list(Z = 0))

#> Joining, by = "ID"
#> # A tibble: 4 × 3
#>      ID     X     Z
#>   <dbl> <dbl> <dbl>
#> 1 23425   312   111
#> 2 84733   354     0
#> 3 49822   765   222
#> 4 39940   432     0

Upvotes: 2

Related Questions