Reputation: 37
My dataset looks like:
PID V1 V2 V3 V4 V5
123 1 13-06-2004 12-08-2002 19-03-2003 2
123 2 19-07-2008 18-05-2006 31-05-2007 2
234 1 08-07-2010 07-07-2007 07-05-2008 3
345 1 11-12-2012 13-11-2011 12-06-2012 1
456 1 17-09-2018 15-08-2015 29-10-2016 3
I have done year calculation in V5
by V2 - V3 / 365
Now I need PID
with single record alone need to be update
by v2-v4
column and expected output:
PID V1 V2 V3 V4 V5
123 1 13-06-2004 12-08-2002 19-03-2003 2
123 2 19-07-2008 18-05-2006 31-05-2007 2
234 1 08-07-2010 07-07-2007 07-05-2008 2
345 1 11-12-2012 13-11-2011 12-06-2012 0.6
456 1 17-09-2018 15-08-2015 29-10-2016 2
I struggle to update with single record alone.
Upvotes: 0
Views: 219
Reputation: 389235
My numbers don't match to your expected output but based on your description I think you are trying to update V5
values for groups with 1 row which can be done as :
library(dplyr)
df %>%
mutate(across(V2:V4, lubridate::dmy)) %>%
group_by(PID) %>%
mutate(V5 = if(n() == 1) as.numeric((V2-V4)/365) else V5)
# PID V1 V2 V3 V4 V5
# <int> <int> <date> <date> <date> <dbl>
#1 123 1 2004-06-13 2002-08-12 2003-03-19 2
#2 123 2 2008-07-19 2006-05-18 2007-05-31 2
#3 234 1 2010-07-08 2007-07-07 2008-05-07 2.17
#4 345 1 2012-12-11 2011-11-13 2012-06-12 0.499
#5 456 1 2018-09-17 2015-08-15 2016-10-29 1.88
Upvotes: 1
Reputation: 4658
If I understand correctly, you want to change the value of the V5
column whenever for that specific PID
there is one row.
That can be achieved as such:
df %>%
add_count(PID) %>%
mutate(V5 = if_else(n == 1, as.numeric((V2-V4)/365), V5))
Which gives:
# A tibble: 5 x 7
PID V1 V2 V3 V4 V5 n
<dbl> <dbl> <date> <date> <date> <dbl> <int>
1 123 1 2004-06-13 2002-08-12 2003-03-19 2 2
2 123 2 2008-07-19 2006-05-18 2007-05-31 2 2
3 234 1 2010-07-08 2007-07-07 2008-05-07 2.17 1
4 345 1 2012-12-11 2011-11-13 2012-06-12 0.499 1
5 456 1 2018-09-17 2015-08-15 2016-10-29 1.88 1
Data:
df <- structure(list(PID = c(123, 123, 234, 345, 456), V1 = c(1, 2,
1, 1, 1), V2 = structure(c(12582, 14079, 14798, 15685, 17791), class = "Date"),
V3 = structure(c(11911, 13286, 13701, 15291, 16662), class = "Date"),
V4 = structure(c(12130, 13664, 14006, 15503, 17103), class = "Date"),
V5 = c(2, 2, 3, 1, 3)), class = c("spec_tbl_df", "tbl_df",
"tbl", "data.frame"), row.names = c(NA, -5L), spec = structure(list(
cols = list(PID = structure(list(), class = c("collector_double",
"collector")), V1 = structure(list(), class = c("collector_double",
"collector")), V2 = structure(list(), class = c("collector_character",
"collector")), V3 = structure(list(), class = c("collector_character",
"collector")), V4 = structure(list(), class = c("collector_character",
"collector")), V5 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
Upvotes: 0