Reputation: 10954
Since data_frame
does not check variable names, it is possible to define a data_frame
with duplicate column names. But, when trying to select columns based on column names, dplyr
complains that there are duplicate names even when there is no ambiguity about the selection.
For example, in the selection below, even though there are two columns with the name var3
, given the desired selection, both would need to be dropped, so it is not clear why dplyr
complains, and whether it should.
df_x = data_frame(var1 = rnorm(100),
var2 = rnorm(100),
var3 = rnorm(100),
var3 = rnorm(100))
df_x %>%
select(var1:var2)
Upvotes: 1
Views: 243
Reputation: 886938
One option would to be change the column names with make.unique
and then select
df_x %>%
setNames(., make.unique(names(.))) %>%
select(var1:var2)
If we need to select the var3
columns
df_x %>%
setNames(., make.unique(names(.))) %>%
select(matches("^var3")) %>%
head(2)
# var3 var3.1
# (dbl) (dbl)
#1 1.2590590 0.9784617
#2 -0.7163919 -0.9644718
Upvotes: 4