Reputation: 161
I'm trying to create new variables with mutate in dplyr and I can't understand my error, I've tried everything and have not stumbled upon this issue in the past.
I have a large data set, over a million observations. I only provide you with the 20 first observations.
This is how my data looks like:
data1 <- read.table(header=TRUE, text="IDnr visit time year end event survival
7 1 04/09/06 2006 31/12/06 0 118
7 2 04/09/06 2007 31/12/07 0 483
7 3 04/09/06 2008 31/12/08 0 849
7 4 04/09/06 2009 31/12/09 0 1214
7 5 04/09/06 2010 31/12/10 0 1579
7 6 04/09/06 2011 31/12/11 0 1944
20 1 24/10/03 2003 31/12/03 0 68
20 2 24/10/03 2004 31/12/04 0 434
20 3 24/10/03 2005 31/12/05 0 799
20 4 24/10/03 2006 31/12/06 0 1164
20 5 24/10/03 2007 31/12/07 0 1529
20 6 24/10/03 2008 31/12/08 0 1895
20 7 24/10/03 2009 31/12/09 0 2260
20 8 24/10/03 2010 31/12/10 0 2625
20 9 24/10/03 2011 31/12/11 0 2990
87 1 17/01/06 2006 31/12/06 0 348
87 2 17/01/06 2007 31/12/07 0 713
87 3 17/01/06 2008 31/12/08 0 1079
87 4 17/01/06 2009 31/12/09 0 1444
87 5 17/01/06 2010 31/12/10 0 1809")
I must say that the date and time variables does not have this format in my dataset, I't is coded with POSIXct with the format ("%Y-%m-%d"). I't somehow reformats itself when I attach I't to stackoverflow and apply the "code" citations.
Okey, the problem is that I'm trying to create new survival time variables in the same dataset, one is for a cox regression model with stop and start time (survival is stop time and the new start variable should be called survcox).
Also im trying to do a poisson regression where the offset variable (i.e the survival time variable) should be called survpois. This is the code I'm trying to use;
data2 <- data1 %>%
group_by(IDnr) %>%
mutate(survcox = ifelse(visit==1, 0, lag(survival)),
year_aar = substr(data1$year, 1,4), first_day = as.POSIXct(paste0(year_aar, "-01-01-")),
survpois = as.numeric(data1$end - first_day)+1) %>%
mutate(survpois = ifelse(year_aar > first_day, as.numeric(end - year_aar),
survpois)) %>%
ungroup()
I receive an error in this step!
Error: incompatible size (1345000), expecting 6 (the group size) or 1
I have no idea why I get this error, what I't means and why my code doesn't work.
All the help I can get is appreciated, thanks in advance!
Upvotes: 2
Views: 2093
Reputation: 23574
I teased apart your code and found a few issues. One was the thing I mentioned in the comment above. Second thing was the class of end
. If the data you provided is the one, end
is factor. If this is the case in your own situation, you need to convert end
to an date object. The other thing was year_aar > first_day
. first_day
is a date object whereas year_arr
is character. Given those, I modified your code.
data1 %>%
group_by(IDnr) %>%
mutate(survcox = ifelse(visit == 1, 0, lag(survival)),
year_aar = substr(year, 1,4),
first_day = as.POSIXct(paste0(year_aar, "-01-01-")),
survpois = as.numeric(as.POSIXct(end, format = "%d/%m/%y") - first_day) + 1) %>%
mutate(survpois = ifelse(as.numeric(year_aar) > as.numeric(format(first_day, "%Y")),
as.numeric(as.POSIXct(end, format = "%d/%m/%y") - year_aar), survpois)) %>%
ungroup()
Here is a bit of the outcome.
# IDnr visit time year end event survival survcox year_aar first_day survpois
#1 7 1 04/09/06 2006 31/12/06 0 118 0 2006 2006-01-01 365
#2 7 2 04/09/06 2007 31/12/07 0 483 118 2007 2007-01-01 365
#3 7 3 04/09/06 2008 31/12/08 0 849 483 2008 2008-01-01 366
#4 7 4 04/09/06 2009 31/12/09 0 1214 849 2009 2009-01-01 365
#5 7 5 04/09/06 2010 31/12/10 0 1579 1214 2010 2010-01-01 365
Upvotes: 1
Reputation: 50704
It's because you reference variable as data1$year
which doesn't fit in grouped data (and in data1$end
too)
Upvotes: 1