Reputation: 69
I am working on a dataset for students to practice hypothesis tests. The data should contains fictional processing times to produce a construction equipment vehicle. The vehicle comes in different types and with different options that (might) influence the processing time. Based on the processing times and the machine specifications the students will investigate which factors contribute significantly to the processing times and predict the time required to produce a certain machine with a specific configuration.
The end goal for the dataset is to generate the total processing time per machine. In essence the (total) processing time should be an accumulation of a base time + Option 1 time + Option 2 time + option 3 time + etc.…. Each option is to be randomly sampled from a distribution to not make it all too obvious. Only the total time will be provided to the students but I need the options time to construct the total time.
I know how to do random sampling with rnorm() and other distrubutions. But I don't know how to only generate data conditionally based on the content of the column.
The dataset looks something like this.
Machine <- c(1,2,3,4,5,6,7,8,9,10)
Pump.Option <- c("30 Liter", "40 Liter", "30 Liter", "30 Liter", "30 Liter", "30 Liter", "50 Liter", "30 Liter", "30 Liter", "40 Liter")
Piping.Option <- c("No special piping", "No special piping", "special piping", "No special piping", "special piping", "No special piping", "No special piping", "special piping", "special piping", "No special piping")
Lights.Option <- c("Std light", "Std & Addional", "Std & Addional","Std & Addional", "Std & Addional", "Std & Addional", "Std light", "Std & Addional", "Std & Addional", "Std & Addional")
Valve.Option <- c("Safety valve", "Safety valve", "Normal valve", "Normal valve", "Safety valve", "Normal valve", "Safety valve", "Safety valve", "Normal valve", "Safety valve")
Pump.Time <- NA
Piping.Time <- NA
Lights.Time <- NA
Valve.Time <- NA
Total.Time <- NA
DF.Sample <- data.frame(Machine, Pump.Option, Piping.Option, Lights.Option, Valve.Option, Pump.Time, Piping.Time, Lights.Time, Valve.Time, Total.Time)
The times that needs to be generated are the Pump.Time, Piping.Time and Lights.Time based on the contents of the columns Pump.Option, Piping.Option and Lights.Option. These times will be used to calculate the total time for that machine.
The times for the options are something like this.
Upvotes: 1
Views: 36
Reputation: 6234
You could use dplyr's case_when
for this, which provides a relatively clean syntax compared to a set of nested ifelse
statements:
library(dplyr)
DF.Sample %>%
mutate(Pump.Time = case_when(
Pump.Option == "30 Liter" ~ 0,
Pump.Option == "40 Liter" ~ rnorm(n(), mean = 10, sd = 4),
Pump.Option == "50 Liter" ~ rnorm(n(), mean = 20, sd = 10)
),
Piping.Time = case_when(
Piping.Option == "No special piping" ~ 0,
Piping.Option == "special piping" ~ rnorm(n(), mean = 10, sd = 4)
),
Lights.Time = case_when(
Lights.Option == "Std light" ~ 0,
Lights.Option == "Std & Additional" ~ rnorm(n(), mean = 10, sd = 4)
)
)
#> Machine Pump.Option Piping.Option Lights.Option Valve.Option
#> 1 1 30 Liter No special piping Std light Safety valve
#> 2 2 40 Liter No special piping Std & Additional Safety valve
#> 3 3 30 Liter special piping Std & Additional Normal valve
#> 4 4 30 Liter No special piping Std & Additional Normal valve
#> 5 5 30 Liter special piping Std & Additional Safety valve
#> 6 6 30 Liter No special piping Std & Additional Normal valve
#> 7 7 50 Liter No special piping Std light Safety valve
#> 8 8 30 Liter special piping Std & Additional Safety valve
#> 9 9 30 Liter special piping Std & Additional Normal valve
#> 10 10 40 Liter No special piping Std & Additional Safety valve
#> Pump.Time Piping.Time Lights.Time
#> 1 0.000000 0.000000 0.000000
#> 2 4.956528 0.000000 17.716970
#> 3 0.000000 11.051394 10.142101
#> 4 0.000000 0.000000 11.886158
#> 5 0.000000 15.291671 6.745524
#> 6 0.000000 0.000000 5.228694
#> 7 21.520437 0.000000 0.000000
#> 8 0.000000 9.777887 9.222347
#> 9 0.000000 11.219067 14.726647
#> 10 12.761031 0.000000 6.111458
Data
DF.Sample <- data.frame(
Machine = c(1,2,3,4,5,6,7,8,9,10),
Pump.Option = c("30 Liter", "40 Liter", "30 Liter", "30 Liter", "30 Liter", "30 Liter", "50 Liter", "30 Liter", "30 Liter", "40 Liter"),
Piping.Option = c("No special piping", "No special piping", "special piping", "No special piping", "special piping", "No special piping", "No special piping", "special piping", "special piping", "No special piping"),
Lights.Option = c("Std light", "Std & Additional", "Std & Additional","Std & Additional", "Std & Additional", "Std & Additional", "Std light", "Std & Additional", "Std & Additional", "Std & Additional"),
Valve.Option = c("Safety valve", "Safety valve", "Normal valve", "Normal valve", "Safety valve", "Normal valve", "Safety valve", "Safety valve", "Normal valve", "Safety valve")
)
Upvotes: 1