Tanja
Tanja

Reputation: 63

Standardizing at country level using only past data in R

I want to transform the following Stata code to R:

forvalues j=1/17 {
    forvalues i=1870(1)2015 {
        qui sum 'var' if country==`j' & year<=`i' 
        scalar mean_x = r(mean)
        scalar std_x = r(sd)
        replace stdup_`var'=(`var'-mean_xt)/std_xt if country==`j' & year==`i'
    }
}

My try:

for (j in 1:17){
  for (i in 1870:2015){
    mean_var <- colMeans(c1[country==j & year <= i, 'var'], na.rm = TRUE)
    sd_var <- sd(as.numeric(unlist(c1[country==j & year <= i, 'var'])), na.rm = TRUE)
    c1 <- c1 %>%
      mutate(stdup = ifelse(country==j & year == i, (var - mean_var)/sd_var, stdup))
  }
}

Is there a nicer and more efficient way to solve it? The code works but the for loop takes approximately 15 seconds, which is not fast enough.

Upvotes: 0

Views: 95

Answers (2)

Nick Cox
Nick Cox

Reputation: 37183

As this is flagged Stata too, Stata people might be interested in a way of doing it without loops, using rangestat from SSC. This won't fit easily in a comment.

rangestat (mean) mean_`var'=`var' (sd) std_`var'=`var', int(year . 0) by(country) 

gen std_`var' = (`var' - mean_`var') / std_`var' 

Here the local macro references

`var'

as in the question indicate that this is the core part of a loop over some variable names. Naturally, you can replace those local macro references by a particular variable name if that is wanted instead.

Upvotes: 0

Zhiqiang Wang
Zhiqiang Wang

Reputation: 6759

It would be easier to ask a question to state what you want to achieve in R.

My understanding is you wanted to generate a stdup (z-score) for each country and year. As mentioned by Nick Cox, there is also a foreach loop for var. I use mtcars data as an example:

library(dplyr) 
df <- mtcars

If we want to three layers of loops: for each of variables: mpg, disp, wt, qsec by each combination of vs and am:

stdup <- function(x) (x - mean(x, na.rm = TRUE)) / sd(x, na.rm=TRUE)
new_df <- df %>% group_by(vs, am) %>% 
    mutate(across(c("mpg", "disp", "wt", "qsec"), stdup))

Upvotes: 1

Related Questions