Reputation: 71
If I have several columns with multiple words within a factor (separated by ",") in each cell. How can I get a list of unique words for each column? For example:
var1 | var2 | var3
a,b | a,b | a,c
a,x | b,s | d,s
a,d | b,m | e,m
And I'd like to have a result in a list/data frame format:
var1 | var2 | var3
[a,b,d,x] | [a,b,s,m] | [a,c,d,s,e,m]
Upvotes: 4
Views: 2616
Reputation: 4024
Here is a tidy way to do it:
library(dplyr)
data =
data_frame(
var1 = list(c("a", "b"),
c("a", "x") ),
var2 = list(c("a", "b"),
c("b", "s") ) )
long_data =
data %>%
as.list %>%
lapply(. %>%
lapply(. %>%
data_frame(value = .) ) %>%
bind_rows(.id = "row") ) %>%
bind_rows(.id = "column") %>%
group_by(column, row) %>%
mutate(order = 1:n() )
long_data %>%
select(-row) %>%
distinct
Upvotes: 1
Reputation: 193677
You can do this with strsplit
+ unique
in an lapply
statement:
lapply(mydf, function(x) unique(trimws(unlist(strsplit(x, ",")))))
## $var1
## [1] "a" "b" "x" "d"
##
## $var2
## [1] "a" "b" "s" "m"
##
## $var3
## [1] "a" "c" "d" "s" "e" "m"
##
If you want a single string as a result, add a toString
in there, and you can wrap the whole thing in data.frame
to get a data.frame
instead of a list
:
data.frame(lapply(mydf, function(x) toString(unique(trimws(unlist(strsplit(x, ",")))))))
## var1 var2 var3
## 1 a, b, x, d a, b, s, m a, c, d, s, e, m
If you really need the square brackets and no spaces between the "words", then you can use sprintf
+ paste
. Assuming we had stored the output of the list from the first lapply
statement as "temp", then try:
lapply(temp, function(x) sprintf("[%s]", paste(x, collapse = ",")))
## $var1
## [1] "[a,b,x,d]"
##
## $var2
## [1] "[a,b,s,m]"
##
## $var3
## [1] "[a,c,d,s,e,m]"
##
Upvotes: 5