asado23
asado23

Reputation: 467

delete duplicated column dplyr

This morning while doing some analysis with a data frame I got an error due to the presence of duplicated column names. I tried to find a solution using exclusively dplyr but I could not find anything that works. Here is an example to illustrate the problem. A dataframe with a duplicated column name.

x <- data.frame(matrix(c(1, 2, 3),
                c(2,2,1),nrow=2,ncol=3))
colnames(x) <- c("a", "a", "b")

When I try to drop the first column using the select command I get an error

x %>%
  select(-1)%>%filter(b>1)

Error: found duplicated column name: a

I can get rid of the column easily using traditional indexing and the using dplyr to filter by value

x<-x[,-1]%>%filter(b>1)

Which produces the desired output

> x
  a b
1 2 3
2 2 3

Any ideas on how to perform this using only dplyr grammar?

Upvotes: 10

Views: 4534

Answers (2)

kpress
kpress

Reputation: 146

If you wanted to get rid of the first column completely I would just do

x <- x[, c(2:3)]

Or alternatively you could rename it

colnames(x)[1] <- "a.1"

Upvotes: 0

Chrisss
Chrisss

Reputation: 3249

This could work, taking advantage of make.names behaviour. Don't know if I've cheated here, but it seems mostly to take advantage of dplyr functions.

x %>% 
    setNames(make.names(names(.), unique = TRUE)) %>% 
    select(-matches("*\\.[1-9]+$"))

Upvotes: 3

Related Questions