Reputation: 1904
I have a data frame that contains duplicate column names. I'm aware that it's non-standard to use duplicated column names, but these names are actually being reassigned downstream using user inputs. For now, I'm attempting to positionally subset a data frame, but the column names become deduplicated. Here's an example.
> df <- data.frame(x = 1:4, y = 2:5, y = LETTERS[2:5], y = (2+(2:5)), check.names = F)
> df
x y y y
1 1 2 B 4
2 2 3 C 5
3 3 4 D 6
4 4 5 E 7
However, when I attempt to subset, the names change...
> df[, 1:3]
x y y.1
1 1 2 B
2 2 3 C
3 3 4 D
4 4 5 E
Is there any way to prevent this from happening? It only occurs when I subset on columns, not rows.
> df[1:3,]
x y y y
1 1 2 B 4
2 2 3 C 5
3 3 4 D 6
Edit for others noticing this behavior:
I've done some digging into the behavior and this relevant section from the help page for extract.data.frame (type ?'['
)
The relevant section states:
If [ returns a data frame it will have unique (and non-missing) row names, if necessary transforming the row names using make.unique. Similarly, if columns are selected column names will be transformed to be unique if necessary (e.g., if columns are selected more than once, or if more than one column of a given name is selected if the data frame has duplicate column names).
This explains the why, appreciate the comments so far on addressing how to best navigate this.
Upvotes: 3
Views: 504
Reputation: 39154
Here is an option, although I think it is not a good idea to have duplicated column names.
as.data.frame(as.list(df)[1:3], check.names = F)
# x y y
# 1 1 2 B
# 2 2 3 C
# 3 3 4 D
# 4 4 5 E
Upvotes: 3