Reputation: 467
This morning while doing some analysis with a data frame I got an error due to the presence of duplicated column names. I tried to find a solution using exclusively dplyr but I could not find anything that works. Here is an example to illustrate the problem. A dataframe with a duplicated column name.
x <- data.frame(matrix(c(1, 2, 3),
c(2,2,1),nrow=2,ncol=3))
colnames(x) <- c("a", "a", "b")
When I try to drop the first column using the select command I get an error
x %>%
select(-1)%>%filter(b>1)
Error: found duplicated column name: a
I can get rid of the column easily using traditional indexing and the using dplyr to filter by value
x<-x[,-1]%>%filter(b>1)
Which produces the desired output
> x
a b
1 2 3
2 2 3
Any ideas on how to perform this using only dplyr grammar?
Upvotes: 10
Views: 4534
Reputation: 146
If you wanted to get rid of the first column completely I would just do
x <- x[, c(2:3)]
Or alternatively you could rename it
colnames(x)[1] <- "a.1"
Upvotes: 0
Reputation: 3249
This could work, taking advantage of make.names
behaviour. Don't know if I've cheated here, but it seems mostly to take advantage of dplyr functions.
x %>%
setNames(make.names(names(.), unique = TRUE)) %>%
select(-matches("*\\.[1-9]+$"))
Upvotes: 3