Reputation: 3200
data <- read.table(text=
"ID1 ID2 From To
12 127 20090701 20090703
12 127 20090704 20090711
12 127 20090707 20100831
12 127 20100901 99991231
18 880 19740401 20091129
18 880 20100608 99991231
12 127 20080102 20080305
12 127 20080306 20080329
12 128 20080620 20090204"
, header=T)
I want to transform the above data frame into the following form:
result <- read.table(text=
"ID1 ID2 From To
12 127 20080102 20080329
12 127 20090701 99991231
12 128 20080620 20090204
18 880 19740401 20091129
18 880 20100608 99991231"
, header=T)
The tranformation in few words is to group by ID1, ID2 and the unique time period where the subject was continuously active (not a single day break). That is dropping the multiple lines in cases when its not needed (one continuous activity period from Date 1 to Date 2).
Pointing out the solution is appreciated.
For starters, convert the dates:
df$From <- as.Date(as.character(df$From), format = "%Y%m%d")
df$To <- as.Date(as.character(df$To), format = "%Y%m%d")
Upvotes: 0
Views: 131
Reputation: 3200
Figured one way to do this as follow:
library(dplyr)
data$From <- as.Date(as.character(data$From), format = "%Y%m%d")
data$To <- as.Date(as.character(data$To), format = "%Y%m%d")
data <- data %>% arrange(ID2, From) %>% mutate(Difference=9999)
marker <- 1
for (i in 2:length(data$ID1)){
if(data$ID2[i]!=data$ID2[i-1]) marker=i
else{
data$Difference[i]=difftime(data$From[i], data$To[marker])
if(data$Difference[i]>1) marker=i
else if(data$To[i]>data$To[marker]) data$To[marker]=data$To[i]
}
}
data <- filter(data, Difference>1)
data <- data[,-which(colnames(data)=="Difference")]
Can anyone provide other solution than for i loop?
Upvotes: 1