lorenzlorg
lorenzlorg

Reputation: 135

Update some values in a column from an other dataframe in R

I've 2 dataframes.

Dataframe A with millions of values like this:

ID_CLI CHURN
34 1
16 1
19 1
23 1
45 1

Dataframe B with thousands of values like this:

ID_CLI CHURN
23 0
45 0
19 0

and I want this type of output:

ID_CLI CHURN
34 1
16 1
19 0
23 0
45 0

so I want to update dataframe A taking the value from dataframe B (in R).

Upvotes: 2

Views: 53

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 389275

You can do a full_join and use coalesce.

library(dplyr)

res <- full_join(df1, df2, by = 'ID_CLI') %>%
          transmute(ID_CLI, 
                    CHURN = coalesce(CHURN.y, CHURN.x))

#  ID_CLI CHURN
#1     34     1
#2     16     1
#3     19     0
#4     23     0
#5     45     0

In base R -

res <- transform(merge(df1, df2, by = 'ID_CLI', all = TRUE), 
          CHURN = ifelse(is.na(CHURN.y), CHURN.x, CHURN.y))[names(df1)]

Upvotes: 2

Anoushiravan R
Anoushiravan R

Reputation: 21938

A merely tidyverse solution:

library(dplyr)

df1 %>%
  left_join(df2, by = "ID_CLI") %>%
  mutate(across(ends_with(".x"), ~ ifelse(!is.na(get(sub(".x", ".y", cur_column()))), 
                                          get(sub(".x", ".y", cur_column())), .x))) %>%
  select(!ends_with(".y")) %>%
  rename_with(~ sub(".x", "", .), ends_with(".x"))

  ID_CLI CHURN
1     34     1
2     16     1
3     19     0
4     23     0
5     45     0
6     19     0

Upvotes: 2

akrun
akrun

Reputation: 887901

If we have a big dataset, data.table, join update would be fast i.e. join on by the 'ID_CLI' column and assign (:=) the CHURN from second data (i.CHURN) to the first data column 'CHURN'

library(data.table)
setDT(df1)[df2, CHURN := i.CHURN, on = .(ID_CLI)] 

-output

df1
   ID_CLI CHURN
1:     34     1
2:     16     1
3:     19     0
4:     23     0
5:     45     0

data

df1 <- structure(list(ID_CLI = c(34L, 16L, 19L, 23L, 45L), CHURN = c(1L, 
1L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, -5L
))

df2 <- structure(list(ID_CLI = c(23L, 45L, 19L), CHURN = c(0L, 0L, 0L
)), class = "data.frame", row.names = c(NA, -3L))

Upvotes: 3

Related Questions