Roel
Roel

Reputation: 69

Condition-based distribution Sampling

I am working on a dataset for students to practice hypothesis tests. The data should contains fictional processing times to produce a construction equipment vehicle. The vehicle comes in different types and with different options that (might) influence the processing time. Based on the processing times and the machine specifications the students will investigate which factors contribute significantly to the processing times and predict the time required to produce a certain machine with a specific configuration.

The end goal for the dataset is to generate the total processing time per machine. In essence the (total) processing time should be an accumulation of a base time + Option 1 time + Option 2 time + option 3 time + etc.…. Each option is to be randomly sampled from a distribution to not make it all too obvious. Only the total time will be provided to the students but I need the options time to construct the total time.

I know how to do random sampling with rnorm() and other distrubutions. But I don't know how to only generate data conditionally based on the content of the column.

The dataset looks something like this.

Machine                  <-   c(1,2,3,4,5,6,7,8,9,10)
Pump.Option              <-   c("30 Liter", "40 Liter", "30 Liter", "30 Liter", "30 Liter", "30 Liter", "50 Liter", "30 Liter", "30 Liter", "40 Liter")
Piping.Option            <-   c("No special piping", "No special piping", "special piping", "No special piping", "special piping", "No special piping", "No special piping", "special piping", "special piping", "No special piping")
Lights.Option            <-   c("Std light", "Std & Addional", "Std & Addional","Std & Addional", "Std & Addional", "Std & Addional", "Std light", "Std & Addional", "Std & Addional", "Std & Addional")
Valve.Option             <-   c("Safety valve", "Safety valve", "Normal valve", "Normal valve", "Safety valve", "Normal valve", "Safety valve", "Safety valve", "Normal valve", "Safety valve")
Pump.Time                <-   NA       
Piping.Time              <-   NA
Lights.Time              <-   NA
Valve.Time               <-   NA
Total.Time               <-   NA


DF.Sample                <- data.frame(Machine, Pump.Option, Piping.Option, Lights.Option, Valve.Option, Pump.Time, Piping.Time, Lights.Time, Valve.Time, Total.Time)

The times that needs to be generated are the Pump.Time, Piping.Time and Lights.Time based on the contents of the columns Pump.Option, Piping.Option and Lights.Option. These times will be used to calculate the total time for that machine.

The times for the options are something like this.

Upvotes: 1

Views: 36

Answers (1)

Joris C.
Joris C.

Reputation: 6234

You could use dplyr's case_when for this, which provides a relatively clean syntax compared to a set of nested ifelse statements:

library(dplyr)

DF.Sample %>%
    mutate(Pump.Time = case_when(
            Pump.Option == "30 Liter" ~ 0,        
            Pump.Option == "40 Liter" ~ rnorm(n(), mean = 10, sd = 4),
            Pump.Option == "50 Liter" ~ rnorm(n(), mean = 20, sd = 10)
        ), 
        Piping.Time = case_when(
           Piping.Option == "No special piping" ~ 0, 
           Piping.Option == "special piping" ~ rnorm(n(), mean = 10, sd = 4)
        ),
        Lights.Time = case_when(
           Lights.Option == "Std light" ~ 0,
           Lights.Option == "Std & Additional" ~ rnorm(n(), mean = 10, sd = 4)
        )
    )
#>    Machine Pump.Option     Piping.Option    Lights.Option Valve.Option
#> 1        1    30 Liter No special piping        Std light Safety valve
#> 2        2    40 Liter No special piping Std & Additional Safety valve
#> 3        3    30 Liter    special piping Std & Additional Normal valve
#> 4        4    30 Liter No special piping Std & Additional Normal valve
#> 5        5    30 Liter    special piping Std & Additional Safety valve
#> 6        6    30 Liter No special piping Std & Additional Normal valve
#> 7        7    50 Liter No special piping        Std light Safety valve
#> 8        8    30 Liter    special piping Std & Additional Safety valve
#> 9        9    30 Liter    special piping Std & Additional Normal valve
#> 10      10    40 Liter No special piping Std & Additional Safety valve
#>    Pump.Time Piping.Time Lights.Time
#> 1   0.000000    0.000000    0.000000
#> 2   4.956528    0.000000   17.716970
#> 3   0.000000   11.051394   10.142101
#> 4   0.000000    0.000000   11.886158
#> 5   0.000000   15.291671    6.745524
#> 6   0.000000    0.000000    5.228694
#> 7  21.520437    0.000000    0.000000
#> 8   0.000000    9.777887    9.222347
#> 9   0.000000   11.219067   14.726647
#> 10 12.761031    0.000000    6.111458

Data

DF.Sample <- data.frame(
    Machine = c(1,2,3,4,5,6,7,8,9,10), 
    Pump.Option = c("30 Liter", "40 Liter", "30 Liter", "30 Liter", "30 Liter", "30 Liter", "50 Liter", "30 Liter", "30 Liter", "40 Liter"),
    Piping.Option = c("No special piping", "No special piping", "special piping", "No special piping", "special piping", "No special piping", "No special piping", "special piping", "special piping", "No special piping"),
    Lights.Option = c("Std light", "Std & Additional", "Std & Additional","Std & Additional", "Std & Additional", "Std & Additional", "Std light", "Std & Additional", "Std & Additional", "Std & Additional"),
    Valve.Option = c("Safety valve", "Safety valve", "Normal valve", "Normal valve", "Safety valve", "Normal valve", "Safety valve", "Safety valve", "Normal valve", "Safety valve")
)

Upvotes: 1

Related Questions