Harry Wells
Harry Wells

Reputation: 179

passing a string as a data frame column name

I have a data frame called data.df with various columns say col1,col2,col3....col15. The data frame does not have a specific class attribute but any attribute could be potentially used as a class variable. I would like to use an R variable called target which points to the column number to be treated as class as follows :

target<-data.df$col3

and then use that field (target) as input to several learners such as PART and J48 (from package RWeka) :

part<-PART(target~.,data=data.df,control=Weka_control(M=200,R=FALSE))
j48<-J48(target~.,data=data.df,control=Weka_control(M=200,R=FALSE)) 

The idea is to be able to change 'target' only once at the beginning of my R code. How can this be done?

Upvotes: 18

Views: 40497

Answers (2)

metakermit
metakermit

Reputation: 22291

I sometimes manage to get a lot done by using strings to reference columns. It works like this:

> df <- data.frame(numbers=seq(5))
> df
  numbers
1       1
2       2
3       3
4       4
5       5
> df$numbers
[1] 1 2 3 4 5
> df[['numbers']]
[1] 1 2 3 4 5

You can then have a variable target be the name of your desired column as a string. I don't know about RWeka, but many libraries such as ggplot can take string references for columns (e.g. the aes_string parameter instead of aes).

Upvotes: 23

mbq
mbq

Reputation: 18628

If you ask about using references in R, it is impossible.

However, if you ask about getting a column by name not explicitly given, this is possible with [ operator, like this:

theNameOfColumnIwantToGetSummaryOf<-"col3"
summary(data.df[,theNameOfColumnIwantToGetSummaryOf])

...or like that:

myIndexOfTheColumnIwantToGetSummaryOf<-3
summary(data.df[,sprintf("col%d",myIndexOfTheColumnIwantToGetSummaryOf)])

Upvotes: 6

Related Questions