Reputation: 37
I have a dataset (df1) on hundreds of national crises, where each observation is a crisis event at the country level with a start and an end date. I also have the date when the crisis was announced (yyyy-mm-dd format), and a bunch of other crisis characteristics.
df1 <- data.frame(cbind(eventID=c(1,2,3,4), country=c("ALB","ALB","ARG","ARG"), start=c(1994, 1998, 1998, 1991), end=c(1996,1999,1999,1993), announcement=c("1994-11-01","1998-03-01","1998-07-01","1992-01-01"), x1=c(6,2,8,7), x2=c("a","q","k","b")))
eventID country start end announcement x1 x2
1 ALB 1994 1996 1994-11-01 6 a
2 ALB 1998 1999 1998-03-01 2 q
3 ARG 1998 1999 1998-07-01 8 k
4 ARG 1991 1993 1992-01-01 7 b
I need to make df2, a panel of countries with annual observations from the earliest "start" year to the latest "end" year. I want to have a dummy variable, "crisis", that equals 1 for the years between "start" and "end" in df1, and 0 otherwise. I want "announcement" to contain the announcement date in df1 for the year with an announcement, and "NA" otherwise. I would like the extra crisis characteristics, x1 and x2, to show up for crisis years to which they correspond, and "NA" otherwise.
I also need observations for each country for years in which no country has a crisis (in df2: 1997).
df2 <- data.frame(cbind(year=c(1991,1992,1993,1994,1995,1996,1997,1998,1999,1991,1992,1993,1994,1995,1996,1997,1998,1999), country=c("ALB","ALB","ALB","ALB","ALB","ALB","ALB","ALB","ALB","ARG","ARG","ARG","ARG","ARG","ARG","ARG","ARG","ARG"),crisis=c(0,0,0,1,1,1,0,1,1,1,1,1,0,0,0,0,1,1), announcement=c(NA, NA,NA,"1994-11-01",NA,NA,NA,"1998-03-01",NA,NA,"1992-01-01",NA,NA,NA,NA,NA,"1998-07-01"), x1=c(NA,NA,NA,6,6,6,NA,2,2,8,8,8,NA,NA,NA,NA,7,7), x2=c(NA,NA,NA,"a","a","a",NA,"q","q","k","k","k",NA,NA,NA,NA,"b","b")))
year country crisis announcement x1 x2
1991 ALB 0 NA NA NA
1992 ALB 0 NA NA NA
1993 ALB 0 NA NA NA
1994 ALB 1 1994-11-01 6 a
1995 ALB 1 NA 6 a
1996 ALB 1 NA 6 a
1997 ALB 0 NA NA NA
1998 ALB 1 1998-03-01 2 q
1999 ALB 1 NA 2 q
1991 ARG 1 NA 8 k
1992 ARG 1 1992-01-01 8 k
1993 ARG 1 NA 8 k
1994 ARG 0 NA NA NA
1995 ARG 0 NA NA NA
1996 ARG 0 NA NA NA
1997 ARG 0 NA NA NA
1998 ARG 1 1998-07-01 7 b
1999 ARG 1 NA 7 b
I would love any suggestions! I'm stumped as to how to replicate the observations for each year, but only include x1 and x2 values when my new "crisis" dummy = 1
Thanks!
Upvotes: 0
Views: 55
Reputation: 124183
Making use of dplyr and tidyr this could be achieved like so:
library(dplyr)
library(tidyr)
df1 <- data.frame(cbind(eventID=c(1,2,3,4), country=c("ALB","ALB","ARG","ARG"), start=c(1994, 1998, 1998, 1991), end=c(1996,1999,1999,1993), announcement=c("1994-11-01","1998-03-01","1998-07-01","1992-01-01"), x1=c(6,2,8,7), x2=c("a","q","k","b")))
df1 %>%
mutate(year = factor(start, levels = min(start):max(end))) %>%
complete(year, country) %>%
mutate(year = as.numeric(as.character(year))) %>%
arrange(country, year) %>%
group_by(country) %>%
fill(eventID, end, x1, x2) %>%
ungroup() %>%
mutate(across(c(eventID, end, x1, x2), ~ ifelse(end < year, NA, .)),
crisis = as.numeric(!is.na(eventID)))
#> # A tibble: 18 x 9
#> year country eventID start end announcement x1 x2 crisis
#> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
#> 1 1991 ALB <NA> <NA> <NA> <NA> <NA> <NA> 0
#> 2 1992 ALB <NA> <NA> <NA> <NA> <NA> <NA> 0
#> 3 1993 ALB <NA> <NA> <NA> <NA> <NA> <NA> 0
#> 4 1994 ALB 1 1994 1996 1994-11-01 6 a 1
#> 5 1995 ALB 1 <NA> 1996 <NA> 6 a 1
#> 6 1996 ALB 1 <NA> 1996 <NA> 6 a 1
#> 7 1997 ALB <NA> <NA> <NA> <NA> <NA> <NA> 0
#> 8 1998 ALB 2 1998 1999 1998-03-01 2 q 1
#> 9 1999 ALB 2 <NA> 1999 <NA> 2 q 1
#> 10 1991 ARG 4 1991 1993 1992-01-01 7 b 1
#> 11 1992 ARG 4 <NA> 1993 <NA> 7 b 1
#> 12 1993 ARG 4 <NA> 1993 <NA> 7 b 1
#> 13 1994 ARG <NA> <NA> <NA> <NA> <NA> <NA> 0
#> 14 1995 ARG <NA> <NA> <NA> <NA> <NA> <NA> 0
#> 15 1996 ARG <NA> <NA> <NA> <NA> <NA> <NA> 0
#> 16 1997 ARG <NA> <NA> <NA> <NA> <NA> <NA> 0
#> 17 1998 ARG 3 1998 1999 1998-07-01 8 k 1
#> 18 1999 ARG 3 <NA> 1999 <NA> 8 k 1
Upvotes: 0