Reputation: 2930
Assume, I have a dataframe
df1 = data.frame(df1.a=1:3, df1.b=1:3, df1.c=1:3)
df1.a df1.b df1.c
1 1 1 1
2 2 2 2
3 3 3 3
And create a second one from the first one using different selectors:
df2 = data.frame(df2.a=df1$df1.a, df2.b=df1[,"df1.b"], df2.c=df1["df1.c"])
Why does the column name of the third column get overridden by the original column name and the others don't?
df2.a df2.b df1.c <-- why is this not df2.c??
1 1 1 1
2 2 2 2
3 3 3 3
Upvotes: 0
Views: 51
Reputation: 4650
That is because df1["df1.a"]
is a data.frame of one column, whereas df1[,"df1.a"]
is a vector.
Try
> class(df1[,"df1.a"])
[1] "integer"
> class(df1["df1.a"])
[1] "data.frame"
According to the documentation:
For a named or unnamed matrix/list/data frame argument that contains a single column, the column name in the result is the column name in the argument.
Therefore, the argument name in
data.frame(…, df2.c=df1["df1.c"])
is "ignored" and the call treated as
data.frame(…, df1.c=df1$df1.c)
Of course, the argument name is technically not ignored.
As to why that is—the column naming is complex:
How the names of the data frame are created is complex, and the rest of this paragraph is only the basic story.
For example, try
data.frame(df2.x = df1[c("df1.a", "df1.b")])
df2.x.df1.a df2.x.df1.b
1 1 1
2 2 2
3 3 3
(Thanks to Roman for pointing to a better part of the documentation.)
Upvotes: 3