updating the dataframe according to the specified conditions in R

library(data.table)
a <- data.table(p = seq(1, 5, 0.5), s = 1:9)
b <- data.table(p = c(2.0, 6, 3.5), s = c(4, 7, 0))

I have 2 dataframes, dataframe "a" is the basis, dataframe "b" contains updates. I need to make changes to the dataframe "a" using the following conditions:

  1. if in the dataframe "b" in the column "s" there is a value of 0, then it is necessary to remove from the dataframe "a" all rows with the corresponding value "p"
  2. if in the dataframe "in" the value in the column "s" differs from the corresponding value "s" in the dataframe "a", then you must replace it with the value from the dataframe "b"
  3. if the dataframe "b" has a unique value "p", then you need to add this line to the dataframe "a"

result:

     p s
1: 1.0 1
2: 1.5 2
3: 2.0 4
4: 2.5 4
5: 3.0 5
6: 4.0 7
7: 4.5 8
8: 5.0 9
9: 6.0 7

Upvotes: 3

Views: 55

Answers (4)

AnilGoyal
AnilGoyal

Reputation: 26218

using coalesce it will be easier

a %>%
  full_join(b, by = "p") %>%
  transmute(p, s = coalesce(s.y, s.x)) %>%
  filter(s != 0)

     p s
1: 1.0 1
2: 1.5 2
3: 2.0 4
4: 2.5 4
5: 3.0 5
6: 4.0 7
7: 4.5 8
8: 5.0 9
9: 6.0 7

Upvotes: 1

Anoushiravan R
Anoushiravan R

Reputation: 21908

I first used full_join so that we have all the rows of both data frames. In case s.x and s.y are not equal and s.y is not an NA value it replaces s.x even in case of zeros which are later deleted. If s.x is NA and s.y is not which means that the row id for s.y coming from data frame b does not exist in data a it again replaces it.

library(dplyr)
library(tidyr)
library(stringr)

a %>%
  full_join(b, by = "p") %>%
  mutate(s.x = ifelse(s.x != s.y & !is.na(s.y), s.y, s.x), 
         s.x = ifelse(is.na(s.x) & !is.na(s.y), s.y, s.x)) %>%
  select(-s.y) %>%
  filter(s.x != 0) %>%
  rename_with(~ str_remove(., ".x"), ends_with(".x"))

     p s
1: 1.0 1
2: 1.5 2
3: 2.0 4
4: 2.5 4
5: 3.0 5
6: 4.0 7
7: 4.5 8
8: 5.0 9
9: 6.0 7

Upvotes: 3

Till
Till

Reputation: 6628

Using dplyr (question does not specify if data.table syntax is necessary).

  1. Get values for p that need to be deleted.
  2. Filter out values found in 1. and use distinct() to make sure all rows are unique.
library(dplyr)

to_delete <- 
  filter(b, s == 0) %>% 
  pull(p)

bind_rows(a, b) %>% 
  filter(!p %in% to_delete) %>% 
  distinct(p, .keep_all = TRUE)
#>      p s
#> 1: 1.0 1
#> 2: 1.5 2
#> 3: 2.0 3
#> 4: 2.5 4
#> 5: 3.0 5
#> 6: 4.0 7
#> 7: 4.5 8
#> 8: 5.0 9
#> 9: 6.0 7

Upvotes: 4

akrun
akrun

Reputation: 886948

Perhaps a join on 'p' (assuming they have the same precision of values) to update the 's' by the corresponding 's' from 'b' (i.s), remove the rows where 's' is 0 and rbind the rows of 'b' whose 'p' values are not in a's p

rbind(a[b,  s :=  i.s, on = .(p)][s != 0], b[!p %in% a$p])

-output

#    p s
#1: 1.0 1
#2: 1.5 2
#3: 2.0 4
#4: 2.5 4
#5: 3.0 5
#6: 4.0 7
#7: 4.5 8
#8: 5.0 9
#9: 6.0 7

Upvotes: 2

Related Questions