Jordan_b
Jordan_b

Reputation: 315

Generate sequence from a value using R

I have a dataset that looks like this:

date    |country   | crisis | 
2011 q2 |   AT     |    0   |
2011 q3 |   AT     |    0   |
2011 q4 |   AT     |    1   |
2012 q1 |   AT     |    0   |
2012 q2 |   AT     |    1   | 
2011 q1 |   BE     |    0   |
2011 q3 |   BE     |    0   |
2011 q4 |   BE     |    0   |
2012 q1 |   BE     |    1   |
2012 q2 |   BE     |    0   | 

And I would like to compute another variable for each country-crisis pair that is 0 when crisis is 1 and then create a sequence starting from zero, like this

date    |country   | crisis | AT_crisis1 | AT_crisis2 | BE_crisis 1
2011 q2 |   AT     |    0   |  -2        |     -4     |    NA
2011 q3 |   AT     |    0   |  -1        |     -3     |    NA
2011 q4 |   AT     |    1   |   0        |     -2     |    NA
2012 q1 |   AT     |    0   |   1        |     -1     |    NA
2012 q2 |   AT     |    1   |   2        |      0     |    NA
2011 q1 |   BE     |    0   |   NA       |     NA     |    -3
2011 q3 |   BE     |    0   |   NA       |     NA     |    -2
2011 q4 |   BE     |    0   |   NA       |     NA     |    -1
2012 q1 |   BE     |    1   |   NA       |     NA     |     0
2012 q2 |   BE     |    0   |   NA       |     NA     |    +1

Here a reproducible example for the first dataset

library(zoo)

date <- c("2011Q2","2011Q3","2011Q4","2012Q1","2012Q2","2011Q2","2011Q3","2011Q4","2012Q1","2012Q2")
date <- as.yearqtr(date)
country <- c("AT","AT","AT","AT","AT","BE","BE","BE","BE","BE")
crisis <- c(0,0,1,0,1,0,0,0,1,0)

df <- data.frame(date, country, crisis)

Thank you.

Upvotes: 1

Views: 85

Answers (2)

chinsoon12
chinsoon12

Reputation: 25223

An option using data.table:

library(data.table)
rbindlist(setDT(df)[, {
        idx <- .SD[crisis==1L, which=TRUE]
        names(idx) <- paste0(.BY$country, "_crisis", seq_along(idx))

        .(.(c(country=.BY$country, .SD, 
            lapply(idx, function(k) seq_len(.N) - k))))
    },
    country]$V1, 
use.names=TRUE, fill=TRUE)

output:

    country    date crisis AT_crisis1 AT_crisis2 BE_crisis1
 1:      AT 2011 Q2      0         -2         -4         NA
 2:      AT 2011 Q3      0         -1         -3         NA
 3:      AT 2011 Q4      1          0         -2         NA
 4:      AT 2012 Q1      0          1         -1         NA
 5:      AT 2012 Q2      1          2          0         NA
 6:      BE 2011 Q2      0         NA         NA         -3
 7:      BE 2011 Q3      0         NA         NA         -2
 8:      BE 2011 Q4      0         NA         NA         -1
 9:      BE 2012 Q1      1         NA         NA          0
10:      BE 2012 Q2      0         NA         NA          1

Upvotes: 1

moodymudskipper
moodymudskipper

Reputation: 47350

Here's a way :

library(zoo)
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric
library(tidyverse)
date <- c("2011Q2","2011Q3","2011Q4","2012Q1","2012Q2","2011Q2","2011Q3","2011Q4","2012Q1","2012Q2")
date <- as.yearqtr(date)
country <- c("AT","AT","AT","AT","AT","BE","BE","BE","BE","BE")
crisis <- c(0,0,1,0,1,0,0,0,1,0)
df <- data.frame(date, country, crisis)

df %>%
  mutate_at("date", as.character) %>% # as it's a factor in sample data
  group_split(country) %>%
  map_dfr(~imap_dfr(which(.$crisis==1), ~mutate(..3, key =.y, value = seq(1-.,len = nrow(..3))), .)) %>%
  unite(key, country, key, sep = "_crisis", remove = FALSE) %>%
  spread(key, value) %>%
  arrange(country, date)
#> # A tibble: 10 x 6
#>    date    country crisis AT_crisis1 AT_crisis2 BE_crisis1
#>    <chr>   <fct>    <dbl>      <dbl>      <dbl>      <dbl>
#>  1 2011 Q2 AT           0         -2         -4         NA
#>  2 2011 Q3 AT           0         -1         -3         NA
#>  3 2011 Q4 AT           1          0         -2         NA
#>  4 2012 Q1 AT           0          1         -1         NA
#>  5 2012 Q2 AT           1          2          0         NA
#>  6 2011 Q2 BE           0         NA         NA         -3
#>  7 2011 Q3 BE           0         NA         NA         -2
#>  8 2011 Q4 BE           0         NA         NA         -1
#>  9 2012 Q1 BE           1         NA         NA          0
#> 10 2012 Q2 BE           0         NA         NA          1

Created on 2020-01-09 by the reprex package (v0.3.0)

The double map call is probably a bit intimidating, here's what it does :

  • The outside map call applies a transformation on each country dataframe created by split, and binds them together
  • the second map call loops on values of which(crisis), we use an imap function to keep track of the index too because we need it to name the column (store it in key at this step). For each which value we create a dataframe with a value column containing the sequence that will be moved to the new columns later, for this we need the data frame object, which we pass as a 3rd parameter, hence the last . of the line, and the ..3 in the call.

Upvotes: 3

Related Questions