Reputation: 78917
I have this string:
x <- c("A B B C")
[1] "A B B C"
I am looking for the shortest way to get this:
[1] "A B C"
I have tried this: Removing duplicate words in a string in R
paste(unique(x), collapse = ' ')
[1] "A B B C"
# does not work
Background: In a dataframe column I want to count only the unique word counts.
Upvotes: 2
Views: 572
Reputation: 39657
Just in case the duplicates are not following each other, also using gsub
.
x <- c("A B B C")
gsub("\\b(\\S+)\\s+(?=.*\\b\\1\\b)", "", x, perl=TRUE)
#[1] "A B C"
gsub("\\b(\\S+)\\s+(?=.*\\b\\1\\b)", "", "A B B A ABBA", perl=TRUE)
#[1] "B A ABBA"
Upvotes: 3
Reputation: 25323
Another possible solution, based on stringr::str_split
:
library(tidyverse)
str_split(x, " ") %>% unlist %>% unique
#> [1] "A" "B" "C"
Upvotes: 3
Reputation: 887038
A regex
based approach could be shorter - match the non-white space (\\S+
) followed by a white space character (\\s
), capture it, followed by one or more occurrence of the backreference, and in the replacement, specify the backreference to return only a single copy of the match
gsub("(\\S+\\s)\\1+", "\\1", x)
[1] "A B C"
Or may need to split the string with strsplit
, unlist
, get the unique
and then paste
paste(unique(unlist(strsplit(x, " "))), collapse = " ")
# [1] "A B C"
Upvotes: 4