storaged
storaged

Reputation: 1847

Assignment in definition of data.frame

This one is not strictly a problem, but a thing that I encountered by accident. However, it is really intriguing to me.

I've run the following line in my console

sc_matrix <- data.frame(sc_start<-rpois(n=15, 0.4), sc_end<-rpois(n=15, 0.3))

and I was really surprised that the output was

head(sc_matrix, n=5)
#   sc_start....rpois.n...15..0.4. sc_end....rpois.n...15..0.3.
#1                               0                            1
#2                               0                            2
#3                               0                            0
#4                               1                            1
#5                               0                            0

First, I was surprised because the interpreter understood me (without even a warning). The data.frame was created even though I have used <- assignment inside of the data.frame constructor.

Second, the colnames seems to be created according to the rule change all non-alpha-numeric into .(dot) and use it as a name.

After reading the discussion on assignments comparison I guess my question is:

How R handles that line of code? Since there is no = operator it evaluates each argument, e.g. sc_start<-rpois(n=15, 0.4), creates column name from it and uses the value of the right-side evaluation?

It seems tricky, since the operator <- does not return any value and I would guess the created data.frame should contain something like NULL. I will appreciate any comments on this.

Upvotes: 4

Views: 111

Answers (2)

989
989

Reputation: 12937

In your example, by

sc_start <- rpois(n=15, 0.4) 

you actually assign the result of rpois(n=15, 0.4) to the variable sc_start. The same holds for sc_end <- rpois(n=15, 0.3).

After creating the data frame, you will notice that those variables are created and placed in your global environment.

What you do is basically the same as

data.frame(rpois(n=15, 0.4), rpois(n=15, 0.3))

in which the column names are not specified explicitly and thus R creates them automatically unless fix.empty.names is set to FALSE. The only difference is that you keep the result of each column in a variable. That is, sc_start and sc_end.

Check the result of

data.frame(x = sc_start <- rpois(n=15, 0.4), y = sc_end <- rpois(n=15, 0.3))

You will notice that the column names are x and y due to = operator and sc_start and sc_end are in your global environment due to <- operator.

Upvotes: 2

Roland
Roland

Reputation: 132676

sc_matrix <- data.frame(sc_start<-rpois(n=15, 0.4), sc_end<-rpois(n=15, 0.3))

To understand what happens here, you need to know that like almost everything in R (except data objects) <- is actually a function. You can even do things like `<-`(a, 1). This function has an invisible return value, which is the RHS of the assignment (see help("<-")), i.e., your assumption is wrong.

If you don't pass column names to data.frame (as the LHS of =) it uses substitute to create names. These names are sanitized if check.names = TRUE, the default. What you observe is essentially the same as if you do something like data.frame(1).

Upvotes: 4

Related Questions