val
val

Reputation: 1709

Converting to the right format and counting items in a data frame

How do I convert df into df2, where df is given by:

> df
  ID  VALUES
1  1 a,b,c,d
2  2       a
3  3 c,d,f,g

and df2 should look something like:

> df2
  ID a b c d f g
1  1 1 1 1 1 0 0
2  2 1 0 0 0 0 0
3  3 0 0 1 1 1 1

where the values from df have been broken out into separate columns and 1s and 0s reflect whether or not the ID was associated with that value (from df).

Is there a specific function for this? I thought this is what table() did but if that's the case I can't figure it out.

Upvotes: 2

Views: 40

Answers (1)

IRTFM
IRTFM

Reputation: 263362

Here's a method that uses no extra packages:

 0 + t( sapply(df[['VALUES']], function(x) {
                               letters[1:6] %in% scan(text=x, what="", sep=",") }))
Read 4 items
Read 1 item
Read 4 items
        [,1] [,2] [,3] [,4] [,5] [,6]
a,b,c,d    1    1    1    1    0    0
a          1    0    0    0    0    0
c,d,f,g    0    0    1    1    0    1

It does return a matrix and it does depend on the VALUES column being character rather than factor. If you want to suppress the information messages from scan there is a parmeter for that. You could cbind this with the ID column:

 cbind( df["ID"], 0+ t( sapply(df[['VALUES']], function(x) {letters[1:6] %in% scan(text=x, what="", sep="," , quiet=TRUE) })) )

        ID 1 2 3 4 5 6
a,b,c,d  1 1 1 1 1 0 0
a        2 1 0 0 0 0 0
c,d,f,g  3 0 0 1 1 0 1

Upvotes: 3

Related Questions