Eric
Eric

Reputation: 55

How to Create Conditional Dummy Variables (Panel Data) in R?

I have Panel Data consisting of two waves: 18 and 21. I have a status of employment having 4 values.

I want to create a dummy taking value 1 if the person is employed in both waves and zero otherwise. However, I fail the code produces a dummy having only zero values:

df$dummy <- df %>%
  group_by(NEW_id) %>%
  arrange(New_id, WAVE_NO) %>%
  mutate(dummy = case_when(WAVE_NO==18 & WAVE_NO==21 & EMPLOYMENT_STATUS=="Employed" ~ 1, TRUE ~ 0))

enter image description here

Upvotes: 0

Views: 438

Answers (2)

Dion Groothof
Dion Groothof

Reputation: 1456

We may use split to split the dataframe by id. As split returns a list, we can use lapply to perform some operation on each element of that list (here: creating the dummy variable). The output of lapply will be a list as well. However, we want a data.frame, so we make a call to do.call(), which performs some action on all elements of a list at once (here: rbind).

set.seed(1)

n <- 10L
K <- 2L
df <- data.frame(
  id = rep(1L:n, each=K),
  wave = rep(c(18L,21L), n),
  employment = sample(c('Employed', 'Unemployed'), n*K, replace = TRUE)
)

# add dummy to data frame
df <- do.call(rbind, lapply(split(df, df$id), function(x) {
  x$dummy <- ifelse(x$employment %in% 'Employed', 1L, 0L)
  x$dummy <- ifelse(sum(x$dummy) == 2L, 1L, 0L)
  return(x)
}))
rownames(df) <- NULL

Output

> head(df)
  id wave employment dummy
1  1   18   Employed     0
2  1   21 Unemployed     0
3  2   18   Employed     1
4  2   21   Employed     1
5  3   18 Unemployed     0
6  3   21   Employed     0

Upvotes: 1

Yuriy Saraykin
Yuriy Saraykin

Reputation: 8880

df <- data.frame(
  stringsAsFactors = FALSE,
  id = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L),
  wave = c(18L, 21L, 18L, 21L, 18L, 21L, 18L, 10L, 18L, 21L),
  EMPLOYMENT_STATUS = c(
    "Employed",
    "Employed",
    "unemployed",
    "Employed",
    "unemployed",
    "Employed",
    "Employed",
    "Employed",
    "unemployed",
    "unemployed"
  )
)

library(tidyverse)
df %>%
  group_by(id) %>%
  mutate(dummy = +(all(wave %in% c(18, 21)) &
                     all(EMPLOYMENT_STATUS == "Employed"))) %>%
  ungroup()
#> # A tibble: 10 x 4
#>       id  wave EMPLOYMENT_STATUS dummy
#>    <int> <int> <chr>             <int>
#>  1     1    18 Employed              1
#>  2     1    21 Employed              1
#>  3     2    18 unemployed            0
#>  4     2    21 Employed              0
#>  5     3    18 unemployed            0
#>  6     3    21 Employed              0
#>  7     4    18 Employed              0
#>  8     4    10 Employed              0
#>  9     5    18 unemployed            0
#> 10     5    21 unemployed            0

Created on 2022-01-23 by the reprex package (v2.0.1)

Upvotes: 0

Related Questions