xxx
xxx

Reputation: 307

How can I generate a dataset for discrete choice models

I have a dataset of individual actual choices

set.seed(123)
data <- tibble(id = c(1:100), choice = sample(c('a','b','c','d'),100,replace = T), x = runif(100, min=0, max=100))
head(data)
# A tibble: 6 x 3
     id choice     x
  <int> <chr>  <dbl>
1     1 a      91.1 
2     2 d      87.5 
3     3 d       3.88
4     4 b      32.0 
5     5 d      27.8 
6     6 c      76.3 

id is the id number of an individual; choice is the actual choice from a b c and d; x is some individual character.

To run a specific models, I wish to generate a dataset of chosen and un-chosen observations, the dataset should look like

   id choice     x   chosen
    1 a      91.1      1  
    1 b      91.1      0
    1 c      91.1      0
    1 d      91.1      0
    2 a      87.5      0
    2 b      87.5      0
    2 c      87.5      0
    2 d      87.5      1
    3 a       3.88     0
    3 b       3.88     0
    3 c       3.88     0
    3 d       3.88     1

where chosen is a dummy indicating whether the choice is actually chosen.

Is there a tidy way to do this?

Thank you so much for your help!

Upvotes: 0

Views: 186

Answers (1)

Allan Cameron
Allan Cameron

Reputation: 174393

You can use tidyr::complete

(Note that the random numbers generated in data were different from the example despite the random seed)

complete(data = data, id, choice) %>%
  group_by(id) %>%
  mutate(chosen = ifelse(is.na(x), 0, 1),
         x = x[!is.na(x)][1])
#> # A tibble: 400 x 4
#> # Groups:   id [100]
#>       id choice     x chosen
#>    <int> <chr>  <dbl>  <dbl>
#>  1     1 a       60.0      0
#>  2     1 b       60.0      0
#>  3     1 c       60.0      1
#>  4     1 d       60.0      0
#>  5     2 a       33.3      0
#>  6     2 b       33.3      0
#>  7     2 c       33.3      1
#>  8     2 d       33.3      0
#>  9     3 a       48.9      0
#> 10     3 b       48.9      0
#> # ... with 390 more rows

Upvotes: 1

Related Questions