Reputation: 959
I have:
id choice
----------
1 "a,b,c"
2 "c"
3 "a,c"
4 "b,c"
I need
id a b c
---------
1 1 1 1
2 0 0 1
3 0 0 1
4 0 1 1
(or equivalent with TRUE/FALSE values)
Is there any way to do this in R? I've looked into strsplit
but that doesn't seem to help.
Upvotes: 3
Views: 1110
Reputation: 269854
Use strsplit
to split choice
creating s
and give it DF$id
as names. From s
extract a vector of all the levels, all_lev
. Then sapply
a function over s
which creates a factor from each component of s
and runs table
on it. Finally transpose that.
s <- setNames( strsplit(DF$choice, ","), DF$id )
all_lev <- sort(unique(unlist(s)))
m <- t(sapply(s, function(x) table(factor(x, lev = all_lev))))
This gives the following matrix where the row names are the id's:
> m
a b c
1 1 1 1
2 0 0 1
3 1 0 1
4 0 1 1
If you prefer a data frame then using m
above:
data.frame(id = rownames(m), m)
Note 1: If we knew that the levels were always "a"
, "b"
and "c"
then we could hard code all_lev
shortening it to:
s <- setNames( strsplit(DF$choice, ","), DF$id )
m <- t(sapply(s, function(x) table(factor(x, lev = c("a", "b", "c")))))
Note 2: We assumed that DF
was this:
Lines <- 'id choice
----------
1 "a,b,c"
2 "c"
3 "a,c"
4 "b,c"'
DF <- read.table(text = Lines, header = TRUE, comment = "-", as.is = TRUE)
Update Shortened answer.
Upvotes: 0
Reputation: 193627
This is exactly what cSplit_e
from my "splitstackshape" package is designed to do.
library(splitstackshape)
cSplit_e(DF, "choice", sep = ",", mode = "binary",
type = "character", fill = 0, drop = TRUE)
# id choice_a choice_b choice_c
# 1 1 1 1 1
# 2 2 0 0 1
# 3 3 1 0 1
# 4 4 0 1 1
This uses DF
from @G.Grothendieck's answer as the input:
Lines <- 'id choice
----------
1 "a,b,c"
2 "c"
3 "a,c"
4 "b,c"'
DF <- read.table(text = Lines, header = TRUE, comment = "-", as.is = TRUE)
Upvotes: 8
Reputation: 109924
This assumes like @kohske did that your data actually looks like you provided. If it does not please use dput
in the future to share data:
txt = 'id choice
----------
1 "a,b,c"
2 "c"
3 "a,c"
4 "b,c"'
dat <- setNames(read.table(text=txt, skip = 2, stringsAsFactors = FALSE),
strsplit(strsplit(txt, "\n")[[1]][1], "\\s+")[[1]]
)
library(qdapTools)
matrix2df(mtabulate(unlist(lapply(split(dat[[2]], dat[[1]]),
strsplit, ",\\s*"), recursive=FALSE)), "id")
I hate nested calls since I became familiar with magrittr
's pipe %>%
so here it is using the pipe:
library(magrittr)
txt %>% read.table(text=., skip = 2, stringsAsFactors = FALSE) %>%
setNames(strsplit(strsplit(txt, "\n")[[1]][1], "\\s+")[[1]]) %>%
with(split(choice, id)) %>%
lapply(strsplit, ",\\s*") %>%
unlist(recursive=FALSE) %>%
mtabulate %>%
matrix2df("id")
## id a b c
## 1 1 1 1 1
## 2 2 0 0 1
## 3 3 1 0 1
## 4 4 0 1 1
Upvotes: 0
Reputation: 66872
try this:
txt = 'id choice
----------
1 "a,b,c"
2 "c"
3 "a,c"
4 "b,c"'
library(dplyr)
txt %>% textConnection %>%
read.table(skip = 2, stringsAsFactors = FALSE) %>%
select(V2) %>% unlist %>%
strsplit("[,]") %>%
lapply(function(x) data.frame(t(table(c(x, "a", "b", "c"))>1))) %>%
rbind_all
then you'll get
Source: local data frame [4 x 3]
a b c
1 TRUE TRUE TRUE
2 FALSE FALSE TRUE
3 TRUE FALSE TRUE
4 FALSE TRUE TRUE
Upvotes: 0