5th
5th

Reputation: 2375

Vector of comma separated strings to matrix

I have been working on this since an hour and I feel like I ran against a wall: I want to transform a vector of comma separated strings to a matrix.

I have a vector like:

'ABC,DFGH,IJ'
'KLMN,OP,DFGH,QR'
'ST,ABC'

I want to get a matrix like

ABC DFGH IJ KLMN OP QR ST
1   1    1  0    0  0  0
0   1    0  1    1  1  0
1   0    0  0    0  0  1

Sample data:

myvec<-c('ABC,DFGH,IJ','KLMN,OP,DFGH,QR','ST,ABC')

Base R answers are welcome as well. I might need this trick for some bigger datasets again.

Upvotes: 1

Views: 126

Answers (3)

lebatsnok
lebatsnok

Reputation: 6449

Another base R solution:

> myvec<-c('ABC,DFGH,IJ','KLMN,OP,DFGH,QR','ST,ABC')
> mv <- strsplit(myvec,",")
> u <- unique(unlist(mv))
> t(sapply(mv, function(x) u %in% x)*1)
# output without colnames
     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    1    1    1    0    0    0    0
[2,]    0    1    0    1    1    1    0
[3,]    1    0    0    0    0    0    1
> r <- t(sapply(mv, function(x) u %in% x)*1)
# adding colnames 
> colnames(r) <- u
> r
     ABC DFGH IJ KLMN OP QR ST
[1,]   1    1  1    0  0  0  0
[2,]   0    1  0    1  1  1  0
[3,]   1    0  0    0  0  0  1

Upvotes: 2

PKumar
PKumar

Reputation: 11128

You can try this with BASE R:

Data:

myvec<-c('ABC,DFGH,IJ','KLMN,OP,DFGH,QR','ST,ABC')

Solution:

unq <- unique(strsplit(paste0(myvec,collapse=","),",")[[1]])
sapply(unq, function(x)grepl(x,strsplit(myvec,","))+0)

Output:

> sapply(unq, function(x)grepl(x,strsplit(myvec,","))+0)
     ABC DFGH IJ KLMN OP QR ST
[1,]   1    1  1    0  0  0  0
[2,]   0    1  0    1  1  1  0
[3,]   1    0  0    0  0  0  1

Upvotes: 1

AntoniosK
AntoniosK

Reputation: 16121

library(tidyverse)

myvec<-c('ABC,DFGH,IJ','KLMN,OP,DFGH,QR','ST,ABC')

data.frame(myvec) %>%                # create a data frame
  mutate(id = row_number(),          # create row id (helpful in order to reshape)
         value = 1) %>%              # create value = 1 (helpful in order to reshape)
  separate_rows(myvec) %>%           # separate values (using the commas; automatically done by this function)
  spread(myvec, value, fill = 0) %>% # reshape dataset
  select(-id)                        # remove row id column

#   ABC DFGH IJ KLMN OP QR ST
# 1   1    1  1    0  0  0  0
# 2   0    1  0    1  1  1  0
# 3   1    0  0    0  0  0  1

Upvotes: 1

Related Questions