Reputation: 315
I have a dataset that looks like this:
date |country | crisis |
2011 q2 | AT | 0 |
2011 q3 | AT | 0 |
2011 q4 | AT | 1 |
2012 q1 | AT | 0 |
2012 q2 | AT | 1 |
2011 q1 | BE | 0 |
2011 q3 | BE | 0 |
2011 q4 | BE | 0 |
2012 q1 | BE | 1 |
2012 q2 | BE | 0 |
And I would like to compute another variable for each country-crisis pair that is 0 when crisis is 1 and then create a sequence starting from zero, like this
date |country | crisis | AT_crisis1 | AT_crisis2 | BE_crisis 1
2011 q2 | AT | 0 | -2 | -4 | NA
2011 q3 | AT | 0 | -1 | -3 | NA
2011 q4 | AT | 1 | 0 | -2 | NA
2012 q1 | AT | 0 | 1 | -1 | NA
2012 q2 | AT | 1 | 2 | 0 | NA
2011 q1 | BE | 0 | NA | NA | -3
2011 q3 | BE | 0 | NA | NA | -2
2011 q4 | BE | 0 | NA | NA | -1
2012 q1 | BE | 1 | NA | NA | 0
2012 q2 | BE | 0 | NA | NA | +1
Here a reproducible example for the first dataset
library(zoo)
date <- c("2011Q2","2011Q3","2011Q4","2012Q1","2012Q2","2011Q2","2011Q3","2011Q4","2012Q1","2012Q2")
date <- as.yearqtr(date)
country <- c("AT","AT","AT","AT","AT","BE","BE","BE","BE","BE")
crisis <- c(0,0,1,0,1,0,0,0,1,0)
df <- data.frame(date, country, crisis)
Thank you.
Upvotes: 1
Views: 85
Reputation: 25223
An option using data.table
:
library(data.table)
rbindlist(setDT(df)[, {
idx <- .SD[crisis==1L, which=TRUE]
names(idx) <- paste0(.BY$country, "_crisis", seq_along(idx))
.(.(c(country=.BY$country, .SD,
lapply(idx, function(k) seq_len(.N) - k))))
},
country]$V1,
use.names=TRUE, fill=TRUE)
output:
country date crisis AT_crisis1 AT_crisis2 BE_crisis1
1: AT 2011 Q2 0 -2 -4 NA
2: AT 2011 Q3 0 -1 -3 NA
3: AT 2011 Q4 1 0 -2 NA
4: AT 2012 Q1 0 1 -1 NA
5: AT 2012 Q2 1 2 0 NA
6: BE 2011 Q2 0 NA NA -3
7: BE 2011 Q3 0 NA NA -2
8: BE 2011 Q4 0 NA NA -1
9: BE 2012 Q1 1 NA NA 0
10: BE 2012 Q2 0 NA NA 1
Upvotes: 1
Reputation: 47350
Here's a way :
library(zoo)
#>
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#>
#> as.Date, as.Date.numeric
library(tidyverse)
date <- c("2011Q2","2011Q3","2011Q4","2012Q1","2012Q2","2011Q2","2011Q3","2011Q4","2012Q1","2012Q2")
date <- as.yearqtr(date)
country <- c("AT","AT","AT","AT","AT","BE","BE","BE","BE","BE")
crisis <- c(0,0,1,0,1,0,0,0,1,0)
df <- data.frame(date, country, crisis)
df %>%
mutate_at("date", as.character) %>% # as it's a factor in sample data
group_split(country) %>%
map_dfr(~imap_dfr(which(.$crisis==1), ~mutate(..3, key =.y, value = seq(1-.,len = nrow(..3))), .)) %>%
unite(key, country, key, sep = "_crisis", remove = FALSE) %>%
spread(key, value) %>%
arrange(country, date)
#> # A tibble: 10 x 6
#> date country crisis AT_crisis1 AT_crisis2 BE_crisis1
#> <chr> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 2011 Q2 AT 0 -2 -4 NA
#> 2 2011 Q3 AT 0 -1 -3 NA
#> 3 2011 Q4 AT 1 0 -2 NA
#> 4 2012 Q1 AT 0 1 -1 NA
#> 5 2012 Q2 AT 1 2 0 NA
#> 6 2011 Q2 BE 0 NA NA -3
#> 7 2011 Q3 BE 0 NA NA -2
#> 8 2011 Q4 BE 0 NA NA -1
#> 9 2012 Q1 BE 1 NA NA 0
#> 10 2012 Q2 BE 0 NA NA 1
Created on 2020-01-09 by the reprex package (v0.3.0)
The double map call is probably a bit intimidating, here's what it does :
which(crisis)
, we use an imap
function to keep track of the index too because we need it to name the column (store it in key
at this step). For each which
value we create a dataframe with a value
column containing the sequence that will be moved to the new columns later, for this we need the data frame object, which we pass as a 3rd parameter, hence the last .
of the line, and the ..3
in the call.Upvotes: 3