Reputation: 16927
Many intro R books and guides start off with the practice of attaching a data.frame
so that you can call the variables by name. I have always found it favorable to call variables with $
notation or square bracket slicing [,2]
. That way I can use multiple data.frame
s without confusing them and/or use iteration to successively call columns of interest. I noticed Google recently posted coding guidelines for R which included the line
1) attach: avoid using it
How do people feel about this practice?
Upvotes: 26
Views: 13525
Reputation: 31800
I prefer not to use attach()
, as it is far too easy to run a batch of code several times each time calling attach()
. The data frame is added to the search path each time, extending it unnecessarily. Of course, good programming practice is to also detach()
at the end of the block of code, but that is often forgotten.
Instead, I use xxx$y or xxx[,"y"]. It's more transparent.
Another possibility is to use the data argument available in many functions which allows individual variables to be referenced within the data frame. e.g., lm(z ~ y, data=xxx)
.
Upvotes: 3
Reputation: 21
While I, too, prefer not to use attach()
, it does have its place when you need to persist an object (in this case, a data.frame
) through the life of your program when you have several functions using it. Instead of passing the object into every R function that uses it, I think it is more convenient to keep it in one place and call its elements as needed.
That said, I would only use it if I know how much memory I have available and only if I make sure that I detach()
this data.frame
once it is out of scope.
Am I making sense?
Upvotes: 2
Reputation: 11341
Just like Leoni said, with
and within
are perfect substitutes for attach
, but I wouldn't completely dismiss it. I use it sometimes, when I'm working directly at the R prompt and want to test some commands before writing them on a script. Especially when testing multiple commands, attach
can be a more interesting, convenient and even harmless alternative to with
and within
, since after you run attach
, the command prompt is clear for you to write inputs and see outputs.
Just make sure to detach
your data after you're done!
Upvotes: 3
Reputation: 10444
"Attach" is an evil temptation. The only place where it works well is in the classroom setting where one is given a single dataframe and expected to write lines of code to do the analysis on that one dataframe. The user is unlikely to ever use that data again once the assignement is done and handed in.
However, in the real world, more data frames can be added to the collection of data in a particular project. Furthermore one often copies and pastes blocks of code to be used for something similar. Often one is borrowing from something one did a few months ago and cannot remember the nuances of what was being called from where. In these circumstances one gets drowned by the previous use of "attach."
Upvotes: 7
Reputation: 9050
I never use attach. with
and within
are your friends.
Example code:
> N <- 3
> df <- data.frame(x1=rnorm(N),x2=runif(N))
> df$y <- with(df,{
x1+x2
})
> df
x1 x2 y
1 -0.8943125 0.24298534 -0.6513271
2 -0.9384312 0.01460008 -0.9238312
3 -0.7159518 0.34618060 -0.3697712
>
> df <- within(df,{
x1.sq <- x1^2
x2.sq <- x2^2
y <- x1.sq+x2.sq
x1 <- x2 <- NULL
})
> df
y x2.sq x1.sq
1 0.8588367 0.0590418774 0.7997948
2 0.8808663 0.0002131623 0.8806532
3 0.6324280 0.1198410071 0.5125870
Edit: hadley mentions transform in the comments. here is some code:
> transform(df, xtot=x1.sq+x2.sq, y=NULL)
x2.sq x1.sq xtot
1 0.41557079 0.021393571 0.43696436
2 0.57716487 0.266325959 0.84349083
3 0.04935442 0.004226069 0.05358049
Upvotes: 25
Reputation: 18487
The main problem with attach is that it can result in unwanted behaviour. Suppose you have an object with name xyz in your workspace. Now you attach dataframe abc which has a column named xyz. If your code reference to xyz, can you guarantee that is references to the object or the dataframe column? If you don't use attach then it is easy. just xyz refers to the object. abc$xyz refers to the column of the dataframe.
One of the main reasons that attach is used frequently in textbooks is that it shortens the code.
Upvotes: 8
Reputation: 368231
I much prefer to use with
to obtain the equivalent of attach
on a single command:
with(someDataFrame, someFunction(...))
This also leads naturally to a form where subset
is the first argument:
with(subset(someDataFrame, someVar > someValue),
someFunction(...))
which makes it pretty clear that we operate on a selection of the data. And while many modelling function have both data
and subset
arguments, the use above is more consistent as it also applies to those functions who do not have data
and subset
arguments.
Upvotes: 13