Bin
Bin

Reputation: 547

how to break a string column into separate logic columns in R

n = 1:5
lett = LETTERS[1:5]
value = character(length = 5)
size = numeric(length = 5)
for (i in 1:5)  {
  set.seed(i)
  size[i] = sample(1:5, 1)
  set.seed(i)
  value[i] = paste(sample(lett, size[i]), collapse = ";")
}

dat = data.frame(n, value)

dat

> dat
  n value
1 1   B;E
2 2     A
3 3     A
4 4 C;A;D
5 5   B;C

The data.frame is as above. I wish to clean the data.frame in the format of:

n   A   B   C   D   E   
1   No  Yes No  No  Yes
2   ...
3   ...
4   ...
5   ...

What should I do? (suppose there are more than 5 categories in values and I do not know how many categories before cleaning the data)

Upvotes: 1

Views: 76

Answers (1)

akrun
akrun

Reputation: 887028

We can split the 'value' column, get the frequency with mtabulate for each of the unique elements, convert to a numeric index matrix and replace the values with 'No' and 'Yes'

library(qdapTools)
m1 <- (mtabulate(strsplit(as.character(dat$value), ";"))!=0)+1
m1[] <- c("No", "Yes")[m1]
data.frame(n = 1:nrow(m1), m1)
#  n   A   B   C   D   E
#1 1  No Yes  No  No Yes
#2 2 Yes  No  No  No  No
#3 3 Yes  No  No  No  No
#4 4 Yes  No Yes Yes  No
#5 5  No Yes Yes  No  No

Upvotes: 2

Related Questions