Reputation: 839
Sample data
sessionid qf Office
12 3 LON1,LON2,LON1,SEA2,SEA3,SEA3,SEA3
12 4 DEL2,DEL1,LON1,DEL1
13 5 MAn1,LON1,DEL1,LON1
Here i want to remove duplicate values in column "OFFICE" by each row.
Expected Output
sessionid qf Office
12 3 LON1,LON2,SEA2,SEA3
12 4 DEL2,DEL1,LON1
13 5 MAN1,LON1,DEL1
Upvotes: 0
Views: 53
Reputation: 783
Here is a base R way of doing it, it works as you'd expect, first split Office by the comma, remove duplicates, then paste back together again
df$Office <- sapply(lapply(strsplit(df$Office, ","),
function(x) {
unique(x)
}),
function(x) {
paste(x, collapse = ",")
},
simplify = T)
or with %>%
df$Office <- df$Office %>%
strsplit(",") %>%
lapply(function(x){unique(x)}) %>%
sapply(function(x){paste(x,collapse = ",")},simplify = T)
Upvotes: 3
Reputation: 887501
We could use tidyverse
. Split the 'Office' by the deimiter and expand to 'long' format, then get the distinct
rows, grouped by 'sessionid', and 'qf', paste
the contents of 'Office'
library(tidyverse)
separate_rows(df1, Office) %>%
distinct() %>%
group_by(sessionid, qf) %>%
summarise(Office = toString(Office))
# A tibble: 3 x 3
# Groups: sessionid [?]
# sessionid qf Office
# <int> <int> <chr>
#1 12 3 LON1, LON2, SEA2, SEA3
#2 12 4 DEL2, DEL1, LON1
#3 13 5 MAn1, LON1, DEL1
Upvotes: 2