europe1
europe1

Reputation: 3

Reformatting Panel Data according to a time and event variable

I have a panel dataset with many variables. The three most relevant variables are: "cid" (country code), 'time" (0-65), and "event" (0, 1, 2, 3, 4, 5, 6). I am trying to run a cox regression (using coxph), however, since the time variable has different starting and ending points for each country, I need to first create a start time and end time variable. Here is where I run into my problem.

Here is what a sample of the three main variables may look like:

> data
      cid   time event
 [1,] "AFG" "20" "0"  
 [2,] "AFG" "21" "0"  
 [3,] "AFG" "22" "0"  
 [4,] "AFG" "23" "0"  
 [5,] "AFG" "24" "0"  
 [6,] "AFG" "25" "0"  
 [7,] "AFG" "26" "1"  
 [8,] "AFG" "27" "1"  
 [9,] "AFG" "28" "1"  
[10,] "AFG" "29" "1"   

The idea is to convert this data into the following:

> data
     cid   time1 time2 event
[1,] "AFG" "20"  "25"  "0"  
[2,] "AFG" "26"  "29"  "1" 

How exactly does one go about doing this (keeping in mind that there are quite a few other explanatory variables in my dataset)?

Upvotes: 0

Views: 105

Answers (2)

Sowmya S. Manian
Sowmya S. Manian

Reputation: 3833

subset1<- data[data$event==0,]
subset1

subset2<- data[data$event==1,]
subset2

s1<- cbind(cid="AFG",time1=min(subset1$time),time2=max(subset1$time),event = 0)
s1

s2<- cbind(cid="AFG",time1=min(subset2$time),time2=max(subset2$time),event = 1)
s2

data1=rbind(s1,s2)
data1
#       cid   time1 time2 event
# [1,] "AFG" "20"  "25"  "0"  
# [2,] "AFG" "26"  "29"  "1"  

Hope this would help a little.

Upvotes: 0

iboboboru
iboboboru

Reputation: 1102

You could use dplyr and pipe. This solution will work if your data is always ordered sequentially as in your example.

data<-data.frame(cid=rep("AFG",10),time=seq(20,29,1),event=c(0,0,0,0,0,0,1,1,1,1))


library(dplyr)

data %>% group_by(cid,event) %>% 
    summarise(time1=min(time),time2=max(time))

Upvotes: 1

Related Questions