Simulate data for regression with standardized Y

Question

Based on this topic, I have created a function that returns a dataset with variables related to the outcome (y) by specific linear coefs.

simulate_data_regression <- function(sample=10, coefs=0, error=0){

  n_var <- length(coefs)
  X <- matrix(0, ncol=n_var, nrow=sample)

  beta <- as.matrix(coefs)

  for (i in 1:n_var){
    X[,i] <- scale(rnorm(sample, 0, 1))
  }

  y <- X %*% beta

  if(error != 0){
    y <- y + rnorm(sample, 0, error)
  }


  data = data.frame(X=X)
  names(data) <- paste0("V", 1:n_var)
  data$y <- as.vector(y)

  return(data)
}

data <- simulate_data_regression(sample=50, coefs=c(0.1, 0.8), error=0)
summary(data)
sd(data$V1)
sd(data$y)

It works great. However, I would need to have a standardized y (mean 0 and SD 1). But when I try to scale it, the coefficients change:

data <- simulate_data_regression(sample=50, coefs=c(0.1, 0.8), error=0)
data$y <- as.vector(scale(data$y))
coef(lm(y ~ ., data=data))

It is possible to do such thing? Thank you very much!

Edit

In other words, I would like the coefs that are specified to be standardized coefs (expressed in outcome's SD).

Scaling y a posteriori changes the coefs by 1/sd(y). However, I can't think of any way to change the betas before generating y, so that the betas return to their specified value after the scaling of y.

Edit 2: Failed attempt

I've tried running the function twice, first extracting sd(y) and scaling the coefficients with it, in the hope that those scaled coefficients will change to the specified ones once I'll scale y. But it doens't work, which is expected, as sd(y) changes when I change the coefs :'(

Here's the failed attempt:

simulate_data_regression <- function(sample=10, coefs=0, error=0, standardized=TRUE){

  stuff <- .simulate_data_regression(sample=sample, coefs=coefs, error=error)
  if(standardized == TRUE){
    y_sd <- sd(data$y)
    data <- .simulate_data_regression(sample=sample, coefs=y_sd*coefs, error=error, X=stuff$X)$data
    data$y <- as.vector(scale(data$y))
  } else{
    data <- stuff$data
  }
  return(data)
}


.simulate_data_regression <- function(sample=10, coefs=0, error=0, X=NULL, y=NULL){

  n_var <- length(coefs)

  if(is.null(X)){
    X <- matrix(0, ncol=n_var, nrow=sample)
    for (i in 1:n_var){
      X[,i] <- scale(rnorm(sample, 0, 1))
    }
  }

  beta <- as.matrix(coefs)
  y <- X %*% beta

  if(error != 0){
    y <- y + rnorm(sample, 0, error)
  }


  data = data.frame(X=X)
  names(data) <- paste0("V", 1:n_var)
  data$y <- as.vector(y)

  return(list(X=X, y=y, data=data))
}

Simulate data for regression with standardized Y

Edit

Edit 2: Failed attempt

Answers (1)

Related Questions