Reputation: 20415
I would like to perform many modifications on the columns of data frame. However, having a large number of columns and transformations required, I would like to avoid having to use the data frame name over and over.
In SAS data step, where within one data step, you can create a variable and refer to it right after defining it:
data A;
set A;
varA = varB > 1;
varC = var A + varB;
....
run;
Is it possible to do this in R?
One way I can think of is to use attach(), then create hundreds of arrays then cbind() them before detach(). I know many R veterans suggest not to use attach(). But I need to do heavy data manipulation (hundreds of new variables), and calling transform(df,) on everyone of them sequentially would be quite cumbersome.
For example:
attach(A)
varA <- varB > 1
varC <- varA + varB
A <- cbind(varA, varB, varC)
detach()
But I am not sure if it is the best way to do this in R.
Upvotes: 2
Views: 106
Reputation: 115392
You can use plyr
and mutate
.
A <- data.frame(varB = 1:5)
library(plyr)
A <- mutate(A, varA = varB>1, varC = varA + varB)
A
varB varA varC
1 1 FALSE 1
2 2 TRUE 3
3 3 TRUE 4
4 4 TRUE 5
5 5 TRUE 6
Or within
in base
R. Notice that within
returns the columns you create in reverse order.
A <- data.frame(varB = 1:5)
A <- within(A, {varA <- varB>1; varC <- varA + varB})
A
varB varC varA
1 1 1 FALSE
2 2 3 TRUE
3 3 4 TRUE
4 4 5 TRUE
5 5 6 TRUE
By far and away my favourite is data.table
and :=
DA <- data.table(varB = 1:5)
DA[,varA := varB >1 ][, varC := varA + varB]
DA
varB varA varC
1: 1 FALSE 1
2: 2 TRUE 3
3: 3 TRUE 4
4: 4 TRUE 5
5: 5 TRUE 6
currently :=
is most easily used only once per call to [
. There are ways around this, but I think the string of [
calls is not too hard to follow (and it will be MUCH MUCH faster than mutate
or any approach that uses data.frames.)
Upvotes: 10
Reputation: 55350
if you want to create a new variable varC
in your dataframe, A
, you can use
A$varC <- A$varA + (A$varA > 1)
Upvotes: 2