R: Why does data.frame only give me nice column names if I use the = operator?

Question

These four ways of creating a dataframe look pretty similar to me:

myData1 <- data.frame(a <- c(1,2), b <- c(3, 4))
myData2 <- data.frame(a = c(1,2), b = c(3,4))
myData3 <- data.frame(`<-`(a,c(1,2)), `<-`(b,c(3, 4)))
myData4 <- data.frame(`=`(a,c(1,2)), `=`(b,c(3,4)))

But If I print out the column names, I only get the nice column names that I would hope for if I use the = operator. In all the other cases, the whole expression becomes the column name, with all the non-alphanumerics replaced by periods:

> colnames(myData1)
[1] "a....c.1..2." "b....c.3..4."
> colnames(myData2)
[1] "a" "b"
> colnames(myData3)
[1] "a....c.1..2." "b....c.3..4."
> colnames(myData4)
[1] "a...c.1..2." "b...c.3..4."

I've read about differences between <- and = when used in function calls in terms of variable scope, but as far as I can reason (possibly not very far), that doesn't explain this particular behavior.

What accounts for the difference between = and <-?
What accounts for the difference between the prefix and infix versions of =?

zero323 · Accepted Answer

When you call a function, including data.frame, = is not used as an assignment operator. It simply marks relationships between given parameter and a variable you pass to the function.

Ignoring data.frame(a = c(1,2), b = c(3,4)), fore each of these calls <- and = are interpreted as normal assignments and create a and b variables in your environment.

> ls()
character(0)
> myData1 <- data.frame(a <- c(1,2), b <- c(3, 4))
[1] "a"       "b"       "myData1"
> rm(list=ls())
> ls()
character(0)
> myData3 <- data.frame(`<-`(a,c(1,2)), `<-`(b,c(3, 4)))
> ls()
[1] "a"       "b"       "myData3"
> rm(list=ls())
> ls()
character(0)
> myData4 <- data.frame(`=`(a,c(1,2)), `=`(b,c(3,4)))
> ls()
[1] "a"       "b"       "myData4"

Data frame get expected values only because <- and = return invisibly the argument.

> foo <- `=`(a,c(1,2))
> foo
[1] 1 2

Because of that your data.frame calls are equivalent, ignoring variable assignment side effect, to

> data.frame(c(1,2), c(3, 4))
  c.1..2. c.3..4.
1       1       3
2       2       4

hence the results you see.

R: Why does data.frame only give me nice column names if I use the = operator?

Answers (2)

Related Questions