AdamNYC
AdamNYC

Reputation: 20415

Refering to newly defined variable within attach()

I would like to perform many modifications on the columns of data frame. However, having a large number of columns and transformations required, I would like to avoid having to use the data frame name over and over.

In SAS data step, where within one data step, you can create a variable and refer to it right after defining it:

 data A;
 set A;
 varA = varB > 1;
 varC = var A + varB;
 ....
 run;

Is it possible to do this in R?

One way I can think of is to use attach(), then create hundreds of arrays then cbind() them before detach(). I know many R veterans suggest not to use attach(). But I need to do heavy data manipulation (hundreds of new variables), and calling transform(df,) on everyone of them sequentially would be quite cumbersome.

For example:

attach(A)
varA <- varB > 1
varC <- varA + varB
A <- cbind(varA, varB, varC)
detach()

But I am not sure if it is the best way to do this in R.

Upvotes: 2

Views: 106

Answers (2)

mnel
mnel

Reputation: 115392

You can use plyr and mutate.

A <- data.frame(varB = 1:5)
library(plyr)
A <- mutate(A, varA = varB>1, varC = varA + varB) 
A
  varB  varA varC
1    1 FALSE    1
2    2  TRUE    3
3    3  TRUE    4
4    4  TRUE    5
5    5  TRUE    6

Or within in base R. Notice that within returns the columns you create in reverse order.

A <- data.frame(varB = 1:5)
A <- within(A, {varA <- varB>1; varC <- varA + varB})
A
 varB varC  varA
1    1    1 FALSE
2    2    3  TRUE
3    3    4  TRUE
4    4    5  TRUE
5    5    6  TRUE

By far and away my favourite is data.table and :=

DA <- data.table(varB = 1:5)


DA[,varA := varB >1 ][, varC := varA + varB]

 DA
   varB  varA varC
1:    1 FALSE    1
2:    2  TRUE    3
3:    3  TRUE    4
4:    4  TRUE    5
5:    5  TRUE    6

currently := is most easily used only once per call to [. There are ways around this, but I think the string of [ calls is not too hard to follow (and it will be MUCH MUCH faster than mutate or any approach that uses data.frames.)

Upvotes: 10

Ricardo Saporta
Ricardo Saporta

Reputation: 55350

if you want to create a new variable varC in your dataframe, A, you can use

A$varC <- A$varA + (A$varA > 1)

Upvotes: 2

Related Questions