Reputation: 2375
I have been working on this since an hour and I feel like I ran against a wall: I want to transform a vector of comma separated strings to a matrix.
I have a vector like:
'ABC,DFGH,IJ'
'KLMN,OP,DFGH,QR'
'ST,ABC'
I want to get a matrix like
ABC DFGH IJ KLMN OP QR ST
1 1 1 0 0 0 0
0 1 0 1 1 1 0
1 0 0 0 0 0 1
Sample data:
myvec<-c('ABC,DFGH,IJ','KLMN,OP,DFGH,QR','ST,ABC')
Base R answers are welcome as well. I might need this trick for some bigger datasets again.
Upvotes: 1
Views: 126
Reputation: 6449
Another base R solution:
> myvec<-c('ABC,DFGH,IJ','KLMN,OP,DFGH,QR','ST,ABC')
> mv <- strsplit(myvec,",")
> u <- unique(unlist(mv))
> t(sapply(mv, function(x) u %in% x)*1)
# output without colnames
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 1 1 0 0 0 0
[2,] 0 1 0 1 1 1 0
[3,] 1 0 0 0 0 0 1
> r <- t(sapply(mv, function(x) u %in% x)*1)
# adding colnames
> colnames(r) <- u
> r
ABC DFGH IJ KLMN OP QR ST
[1,] 1 1 1 0 0 0 0
[2,] 0 1 0 1 1 1 0
[3,] 1 0 0 0 0 0 1
Upvotes: 2
Reputation: 11128
You can try this with BASE R:
Data:
myvec<-c('ABC,DFGH,IJ','KLMN,OP,DFGH,QR','ST,ABC')
Solution:
unq <- unique(strsplit(paste0(myvec,collapse=","),",")[[1]])
sapply(unq, function(x)grepl(x,strsplit(myvec,","))+0)
Output:
> sapply(unq, function(x)grepl(x,strsplit(myvec,","))+0)
ABC DFGH IJ KLMN OP QR ST
[1,] 1 1 1 0 0 0 0
[2,] 0 1 0 1 1 1 0
[3,] 1 0 0 0 0 0 1
Upvotes: 1
Reputation: 16121
library(tidyverse)
myvec<-c('ABC,DFGH,IJ','KLMN,OP,DFGH,QR','ST,ABC')
data.frame(myvec) %>% # create a data frame
mutate(id = row_number(), # create row id (helpful in order to reshape)
value = 1) %>% # create value = 1 (helpful in order to reshape)
separate_rows(myvec) %>% # separate values (using the commas; automatically done by this function)
spread(myvec, value, fill = 0) %>% # reshape dataset
select(-id) # remove row id column
# ABC DFGH IJ KLMN OP QR ST
# 1 1 1 1 0 0 0 0
# 2 0 1 0 1 1 1 0
# 3 1 0 0 0 0 0 1
Upvotes: 1