Patrik_P
Patrik_P

Reputation: 3200

How to aggregate time periods grouped by identificators?

data <- read.table(text=
"ID1    ID2 From    To
12  127 20090701    20090703
12  127 20090704    20090711
12  127 20090707    20100831
12  127 20100901    99991231
18  880 19740401    20091129
18  880 20100608    99991231
12  127 20080102    20080305
12  127 20080306    20080329
12  128 20080620    20090204"
, header=T)

I want to transform the above data frame into the following form:

 result <- read.table(text=
"ID1    ID2 From    To
12  127 20080102    20080329
12  127 20090701    99991231
12  128 20080620    20090204
18  880 19740401    20091129
18  880 20100608    99991231"
, header=T)

The tranformation in few words is to group by ID1, ID2 and the unique time period where the subject was continuously active (not a single day break). That is dropping the multiple lines in cases when its not needed (one continuous activity period from Date 1 to Date 2).

Pointing out the solution is appreciated.

For starters, convert the dates:

df$From <- as.Date(as.character(df$From), format = "%Y%m%d")
df$To <- as.Date(as.character(df$To), format = "%Y%m%d")

Upvotes: 0

Views: 131

Answers (1)

Patrik_P
Patrik_P

Reputation: 3200

Figured one way to do this as follow:

library(dplyr)
data$From <- as.Date(as.character(data$From), format = "%Y%m%d")
data$To <- as.Date(as.character(data$To), format = "%Y%m%d")
data <- data %>% arrange(ID2, From) %>% mutate(Difference=9999)
marker <- 1
for (i in 2:length(data$ID1)){
  if(data$ID2[i]!=data$ID2[i-1]) marker=i
  else{
    data$Difference[i]=difftime(data$From[i], data$To[marker])
    if(data$Difference[i]>1) marker=i
    else if(data$To[i]>data$To[marker]) data$To[marker]=data$To[i]
  }
}
data <- filter(data, Difference>1)
data <- data[,-which(colnames(data)=="Difference")]

Can anyone provide other solution than for i loop?

Upvotes: 1

Related Questions