Reputation: 2077
I have the following data set
df <- data.frame(
path = c("a,b,a",
"(direct) / (none), (direct) / (none), google / cpc, google / cpc",
"f,d",
"a,c"
)
)
and I wish to remove the duplicated so that my output will be
path
1: a, b
2: (direct) / (none), google / cpc
3: f, d
4: a, c
I tried this but it does not work for the second row
setDT(df)
df$path <- sapply(strsplit(as.character(df$path ), split=","), function(x) {
paste(unique(x), collapse = ', ')
})
Upvotes: 3
Views: 2759
Reputation: 24480
You were almost there. The only thing is that you need to split with ",\\s*"
instead of just ","
. In the latter case, calling unique
won't produce the wanted output, since some string may differ for the number of blank spaces. If you remove them when you split, you solve this issue.
On another note, since you used setDT(df)
, I guess you are using data.table
. If so, you need to use proper data.table
grammar to avoid copies:
df[,path:=sapply(
strsplit(as.character(df$path ), split=",\\s*"),
function(x) {paste(unique(x), collapse = ', ')})]
will modify the path
column by reference.
Upvotes: 4
Reputation: 442
It looks like your problem is the initial white space in the second strings. Are you trying to preserve that, or are you willing to lose it? If you're willing to lose it, then
df$path <- sapply(strsplit(as.character(df$path), split=","), function(x) {
paste(unique(trimws(x)), collapse = ', ') } )
is what you want:
> df$path <- sapply(strsplit(as.character(df$path), split=","), function(x) {
+ paste(unique(trimws(x)), collapse = ', ') } )
> df$path
[1] "a, b" "(direct) / (none), google / cpc"
[3] "f, d" "a, c"
>
Upvotes: 2
Reputation: 8413
Basic logic behind below code :
i)split each row on "," , (ii) remove whitespace (iii) take unique values
(iv) collapse back on "," and paste
t = apply(df, 1, function(x) paste0(unique(trimws(unlist(strsplit(x,",")))), collapse = ","))
df=data.frame(t)
# df
# t
#1 a,b
#2 (direct) / (none),google / cpc
#3 f,d
#4 a,c
Upvotes: 1