Tom
Tom

Reputation: 2351

Getting the variables from strings with R formula symbols

I have a very simple question. I want to be able to split "Income*Educ" or "I(Income^2)")) into strings with their respective variables, so: "Income" "Educ" and "Income" respectively. However, I do not know which type it will be..

The following fixes: "Income*Educ"

strsplit(gsub("[^[:alnum:] ]", "", str), " +")[[1]]

And this one almost fixes : "I(Income^2)"))

strsplit(gsub("[^A-Za-z]+", "", str), " +")[[1]]

How can I make this function for both forms?

Upvotes: 0

Views: 51

Answers (1)

MrFlick
MrFlick

Reputation: 206486

It's probably better to avoid regular expression if you want to to parse R code. There are plenty of built in tools for that. If you have a formula like

ff <- . ~ Income*Educ + I(Income^2)

you can get all the variables with

all.vars(ff)

This keeps the "." as the response but you can filter that out.

Upvotes: 1

Related Questions