Leah Bevis
Leah Bevis

Reputation: 386

Bring R list into Stata as macro?

I wish to run a Lasso model in R from Stata and then bring a resulting character list (the names of the subset coefficients) back into Stata as a macro (for example, a global).

At the moment I am aware of two options:

  1. I save a dta file and run an R script from Stata using shell:

    shell $Rloc --vanilla <"${LOC}/Lasso.R"
    

    This works from the saved dta file and allows me to run the Lasso model that I wish to run, but is not interactive, so I can't bring the relevant character list (with the names of subset variables) back into Stata.

  2. I run R interactively from Stata using rcall. However, rcall won't allow me to load a large enough matrix, even under max Stata memory. My predictive matrix Z (to be subset by Lasso) is 1,000 by 100 but when I run the command:

    rcall: X <- st.matrix(Z) 
    

    I receive an error stating:

    macro substitution results in line that is too long: The line resulting from substituting macros would be longer than allowed. The maximum allowed length is 645,216 characters, which is calculated on the basis of set maxvar.

Is there some way to interactively run R from Stata, which allows large matrices, such that I may bring a character list from R back into Stata as a macro?

Thanks in advance.

Upvotes: 1

Views: 297

Answers (1)

user8682794
user8682794

Reputation:

Below i will try to consolidate the comments in a -hopefully- useful answer.

Unfortunately, rcall does not appear to play nicely with large matrices like the one you need. I think it would be best to call R to run your script using the shell command and save the string(s) as variables in a dta file. This requires a bit more work but it is certainly programmable.

Then you could read these variables into Stata and manipulate them easily using built-in functions. For example, you could save the strings in separate variables or in one and use levelsof as @Dimitriy recommended.

Consider the following toy example:

clear
set obs 5

input str50 string
"this is a string"
"A longer string is this"
"A string that is even longer is this one"
"How many strings do you have?"
end

levelsof string, local(newstr) 
`"A longer string is this"' `"A string that is even longer is this one"' `"How many strings do you have?"' `"this is a string"'

tokenize `"`newstr'"'

forvalues i = 1 / `: word count `newstr'' {
    display "``i''"
}

A longer string is this
A string that is even longer is this one
How many strings do you have?
this is a string

From my experience, programs like rcall and rsource are useful for simple tasks. However, they can become a real hassle for more complicated work in which case i personally just resort to the real thing, that is using the other software directly.

As @Dimitriy also indicated, there are now some community-contributed commands available for lasso, ehich may cover your need so you do not have to fiddle with R:

search lasso

5 packages found (Stata Journal and STB listed first)
-----------------------------------------------------

elasticregress from http://fmwww.bc.edu/RePEc/bocode/e
    'ELASTICREGRESS': module to perform elastic net regression, lasso
    regression, ridge regression / elasticregress calculates an elastic
    net-regularized / regression: an estimator of a linear model in which
    larger / parameters are discouraged.  This estimator nests the LASSO / and

lars from http://fmwww.bc.edu/RePEc/bocode/l
    'LARS': module to perform least angle regression / Least Angle Regression
    is a model-building algorithm that / considers parsimony as well as
    prediction accuracy.  This / method is covered in detail by the paper
    Efron, Hastie, Johnstone / and Tibshirani (2004), published in The Annals

lassopack from http://fmwww.bc.edu/RePEc/bocode/l
    'LASSOPACK': module for lasso, square-root lasso, elastic net, ridge,
    adaptive lasso estimation and cross-validation / lassopack is a suite of
    programs for penalized regression / methods suitable for the
    high-dimensional setting where the / number of predictors p may be large

pdslasso from http://fmwww.bc.edu/RePEc/bocode/p
    'PDSLASSO': module for post-selection and post-regularization OLS or IV
    estimation and inference / pdslasso and ivlasso are routines for
    estimating structural / parameters in linear models with many controls
    and/or / instruments. The routines use methods for estimating sparse /

sivreg from http://fmwww.bc.edu/RePEc/bocode/s
    'SIVREG': module to perform adaptive Lasso with some invalid instruments /
    sivreg estimates a linear instrumental variables regression / where some
    of the instruments fail the exclusion restriction / and are thus invalid.
    The LARS algorithm (Efron et al., 2004) is / applied as long as the Hansen

Upvotes: 3

Related Questions