Zach
Zach

Reputation: 996

time to event for panel data

I have a panel data set of country years. I would like to calculate time since event, as well as get a running total of events per country which I can decay over time. I am using the timeSinceEvent function in the doBy package, which returns a data frame which has the values that I want, but I am having trouble applying this to my main df.

structure(list(ccode.a = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 20L, 20L, 20L, 20L, 20L, 
20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 
20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 
20L, 20L, 20L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 
31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 
31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 40L, 40L, 
40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 
40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 
40L, 40L, 40L, 40L, 40L, 40L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 
41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 
41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 
41L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 
42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 
42L, 42L, 42L, 42L, 42L), year = c(1975, 1976, 1977, 1978, 1979, 
1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 
1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 
2002, 2003, 2004, 2005, 2006, 2007, 2008, 1975, 1976, 1977, 1978, 
1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 
1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 
2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 1975, 1976, 1977, 
1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 
1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 
2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 1975, 1976, 
1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 
1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 
1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 1975, 
1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 
1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 
1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 
1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 
1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 
1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004), onset.a = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("ccode.a", "year", 
"onset.a"), row.names = c(NA, 200L), class = "data.frame")

I have tried using this:

last.step <- function(x) {
  temp <- timeSinceEvent(x$onset.a, x$year)
  cbind(x[,1],temp) #timeSinceEvent cuts off the country ID
}
result <- do.call("rbind", by(data, data$ccode.a, last.step))

As well as

test <- by(data, data$ccode.a, function(x) timeSinceEvent(data$onset.a, data$year))

To little avail. I stepped through the function, and it seems to be doing what I want, but I guess there is a problem in the way that I am calling it?

Upvotes: 0

Views: 337

Answers (3)

Zach
Zach

Reputation: 996

Ended up having to modify timeSinceEvent in the doBy package a bit. Here is the final code that worked. Kudos to lselzer for pointing out rbind.fill in plyr and RoyalTS for pointing out that timeSinceEvent returns null when the yvar argument is all zeros.

panel.tse <- function(yvar, tvar = seq_along(yvar)){
   if (!(is.numeric(yvar) | is.logical(yvar))){
        stop("yvar must be either numeric or logical")
    }
   yvar[is.na(yvar)] <- 0
   event.idx <- which(yvar == 1)
   run <- cumsum(yvar)
   un <- unique(run)
   tlist <- list()
   for (i in 1:length(un)){
     v <- un[[i]]
     y <- yvar[run == v]
     t <- tvar[run == v]
     t <- t - t[1]
     tlist[[i]] <- t
   }
   timeAfterEvent <- unlist(tlist)
   timeAfterEvent[run == 0] <- NA
   run[run == 0] <- NA
   ans <- cbind(data.frame(yvar = yvar, tvar = tvar), run, tae = timeAfterEvent)
   return(ans)
 }

last.step <- function(x) {
  temp <- panel.tse(x$onset.a, x$year)
  cbind(x[,1],temp) 
}

result <- do.call(rbind.fill, by(data, data$ccode.a, last.step))

Upvotes: 0

Luciano Selzer
Luciano Selzer

Reputation: 10016

Since there are empty columns you should use rbind.fill() in plyr. It will fill with na the columns that are empty

last.step <- function(x) {
  temp <- timeSinceEvent(x$onset.a, x$year)
  cbind(x[,1],temp) #timeSinceEvent cuts off the country ID
}
result <- do.call(rbind.fill, by(data, data$ccode.a, last.step))

However this won't return the "empty" lists i.e. the one with only the x[,1]. It will only rbind those lists that have data.frame inside. I don't know if this is the expected behaviour and/or is what you want.

Upvotes: 1

RoyalTS
RoyalTS

Reputation: 10203

It seems to me the problem is simply that there are no events for ccode.a==20 and so timeSinceEvent returns NULL when applied to that subset. This means that last.step returns data frames of different dimension for the two ccode.as and thus the rbind fails.

Not exactly a solution, but perhaps better understanding where the problem lies already helps.

Upvotes: 1

Related Questions