tchakravarty
tchakravarty

Reputation: 10954

dplyr: error selecting columns in a data_frame with duplicate names

Since data_frame does not check variable names, it is possible to define a data_frame with duplicate column names. But, when trying to select columns based on column names, dplyr complains that there are duplicate names even when there is no ambiguity about the selection.

For example, in the selection below, even though there are two columns with the name var3, given the desired selection, both would need to be dropped, so it is not clear why dplyr complains, and whether it should.

df_x = data_frame(var1 = rnorm(100), 
           var2 = rnorm(100),
           var3 = rnorm(100), 
           var3 = rnorm(100))

df_x %>% 
  select(var1:var2)

Upvotes: 1

Views: 243

Answers (1)

akrun
akrun

Reputation: 886938

One option would to be change the column names with make.unique and then select

 df_x %>% 
     setNames(., make.unique(names(.))) %>% 
     select(var1:var2)

If we need to select the var3 columns

df_x %>%
     setNames(., make.unique(names(.)))  %>%
     select(matches("^var3")) %>%
     head(2)  
#       var3     var3.1
#       (dbl)      (dbl)
#1  1.2590590  0.9784617
#2 -0.7163919 -0.9644718

Upvotes: 4

Related Questions