Jammy
Jammy

Reputation: 37

update rows based on column condition in r

My dataset looks like:

PID  V1       V2          V3                  V4     V5
123  1   13-06-2004   12-08-2002        19-03-2003   2
123  2   19-07-2008   18-05-2006        31-05-2007   2
234  1   08-07-2010   07-07-2007        07-05-2008   3
345  1   11-12-2012   13-11-2011        12-06-2012   1
456  1   17-09-2018   15-08-2015        29-10-2016   3

I have done year calculation in V5 by V2 - V3 / 365 Now I need PID with single record alone need to be update by v2-v4 column and expected output:

PID     V1      V2         V3           V4      V5
123     1   13-06-2004  12-08-2002  19-03-2003  2
123     2   19-07-2008  18-05-2006  31-05-2007  2
234     1   08-07-2010  07-07-2007  07-05-2008  2
345     1   11-12-2012  13-11-2011  12-06-2012  0.6
456     1   17-09-2018  15-08-2015  29-10-2016  2

I struggle to update with single record alone.

Upvotes: 0

Views: 219

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389235

My numbers don't match to your expected output but based on your description I think you are trying to update V5 values for groups with 1 row which can be done as :

library(dplyr)

df %>%
  mutate(across(V2:V4, lubridate::dmy)) %>%
  group_by(PID) %>%
  mutate(V5 = if(n() == 1) as.numeric((V2-V4)/365) else V5)

#    PID    V1 V2         V3         V4            V5
#  <int> <int> <date>     <date>     <date>     <dbl>
#1   123     1 2004-06-13 2002-08-12 2003-03-19 2    
#2   123     2 2008-07-19 2006-05-18 2007-05-31 2    
#3   234     1 2010-07-08 2007-07-07 2008-05-07 2.17 
#4   345     1 2012-12-11 2011-11-13 2012-06-12 0.499
#5   456     1 2018-09-17 2015-08-15 2016-10-29 1.88 

Upvotes: 1

Bas
Bas

Reputation: 4658

If I understand correctly, you want to change the value of the V5 column whenever for that specific PID there is one row.

That can be achieved as such:

df %>% 
  add_count(PID) %>% 
  mutate(V5 = if_else(n == 1, as.numeric((V2-V4)/365), V5))

Which gives:

# A tibble: 5 x 7
    PID    V1 V2         V3         V4            V5     n
  <dbl> <dbl> <date>     <date>     <date>     <dbl> <int>
1   123     1 2004-06-13 2002-08-12 2003-03-19 2         2
2   123     2 2008-07-19 2006-05-18 2007-05-31 2         2
3   234     1 2010-07-08 2007-07-07 2008-05-07 2.17      1
4   345     1 2012-12-11 2011-11-13 2012-06-12 0.499     1
5   456     1 2018-09-17 2015-08-15 2016-10-29 1.88      1

Data:

df <- structure(list(PID = c(123, 123, 234, 345, 456), V1 = c(1, 2, 
1, 1, 1), V2 = structure(c(12582, 14079, 14798, 15685, 17791), class = "Date"), 
    V3 = structure(c(11911, 13286, 13701, 15291, 16662), class = "Date"), 
    V4 = structure(c(12130, 13664, 14006, 15503, 17103), class = "Date"), 
    V5 = c(2, 2, 3, 1, 3)), class = c("spec_tbl_df", "tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -5L), spec = structure(list(
    cols = list(PID = structure(list(), class = c("collector_double", 
    "collector")), V1 = structure(list(), class = c("collector_double", 
    "collector")), V2 = structure(list(), class = c("collector_character", 
    "collector")), V3 = structure(list(), class = c("collector_character", 
    "collector")), V4 = structure(list(), class = c("collector_character", 
    "collector")), V5 = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1), class = "col_spec"))

Upvotes: 0

Related Questions