sudo make install
sudo make install

Reputation: 5669

R: Why does data.frame only give me nice column names if I use the = operator?

These four ways of creating a dataframe look pretty similar to me:

myData1 <- data.frame(a <- c(1,2), b <- c(3, 4))
myData2 <- data.frame(a = c(1,2), b = c(3,4))
myData3 <- data.frame(`<-`(a,c(1,2)), `<-`(b,c(3, 4)))
myData4 <- data.frame(`=`(a,c(1,2)), `=`(b,c(3,4)))

But If I print out the column names, I only get the nice column names that I would hope for if I use the = operator. In all the other cases, the whole expression becomes the column name, with all the non-alphanumerics replaced by periods:

> colnames(myData1)
[1] "a....c.1..2." "b....c.3..4."
> colnames(myData2)
[1] "a" "b"
> colnames(myData3)
[1] "a....c.1..2." "b....c.3..4."
> colnames(myData4)
[1] "a...c.1..2." "b...c.3..4."

I've read about differences between <- and = when used in function calls in terms of variable scope, but as far as I can reason (possibly not very far), that doesn't explain this particular behavior.

  1. What accounts for the difference between = and <-?
  2. What accounts for the difference between the prefix and infix versions of =?

Upvotes: 0

Views: 83

Answers (2)

IRTFM
IRTFM

Reputation: 263332

When you offer a <- c(1,2) as an argument to data.frame, there will be a value for the first argument, but there will be no name in the formals list. The formals of a function are processed with as.list. Both a and c(1,2) were passed to <- and an element named a is returned and this results in there being no name in the arguments that got sent to as.list. You can think of the symbol a as having already been already processed and therefore "used up". The default names in that situation are the results of a deparsecall.

> make.names(deparse( quote(a <- c(1,2) )) )
[1] "a....c.1..2."

Upvotes: 2

zero323
zero323

Reputation: 330083

When you call a function, including data.frame, = is not used as an assignment operator. It simply marks relationships between given parameter and a variable you pass to the function.

Ignoring data.frame(a = c(1,2), b = c(3,4)), fore each of these calls <- and = are interpreted as normal assignments and create a and b variables in your environment.

> ls()
character(0)
> myData1 <- data.frame(a <- c(1,2), b <- c(3, 4))
[1] "a"       "b"       "myData1"
> rm(list=ls())
> ls()
character(0)
> myData3 <- data.frame(`<-`(a,c(1,2)), `<-`(b,c(3, 4)))
> ls()
[1] "a"       "b"       "myData3"
> rm(list=ls())
> ls()
character(0)
> myData4 <- data.frame(`=`(a,c(1,2)), `=`(b,c(3,4)))
> ls()
[1] "a"       "b"       "myData4"

Data frame get expected values only because <- and = return invisibly the argument.

> foo <- `=`(a,c(1,2))
> foo
[1] 1 2

Because of that your data.frame calls are equivalent, ignoring variable assignment side effect, to

> data.frame(c(1,2), c(3, 4))
  c.1..2. c.3..4.
1       1       3
2       2       4

hence the results you see.

Upvotes: 2

Related Questions