How would I transform character data into a matrix format in R?

Question

I have the following dataframe that I wish to transform. Currently, it looks like this:

 ID
              Items                                       items.split
1   2729    Bicycle                                             Bicycle
2   3979    TV, Mobile Phone, Bicycle, Water Tank               c("TV", "Mobile Phone", "Bicycle", "Water Tank")
3   3860    Mobile Phone, Bicycle, Fan                          c("Mobile Phone", "Bicycle", "Fan")
4   2357    Mobile Phone, Motorbike                             c("Mobile Phone", "Motorbike")
5   2278    TV, Mobile Phone, Wagon/Cart, Motorbike, Plow       c("TV", "Mobile Phone", "Wagon/Cart", "Motorbike", "Plow")
6   3277    TV, Mobile Phone, Bicycle, Motorbike, Fan           c("TV", "Mobile Phone", "Bicycle", "Motorbike", "Fan")
7   3501    Mobile Phone, Bicycle, Water Tank                   c("Mobile Phone", "Bicycle", "Water Tank")
8   3880    Tractor, Mobile Phone, Wagon/Cart, Motorbike, Plow  c("Tractor", "Mobile Phone", "Wagon/Cart", "Motorbike", "Plow")
9   3207    DVD Player, Bicycle, Plow                           c("DVD Player", "Bicycle", "Plow")
10  3928    TV, Mobile Phone, Bicycle, Fan                      c("TV", "Mobile Phone", "Bicycle", "Fan")

I want to transform the dataframe above into the following format:

       Bicycle    TV      Mobile Phone    Water Tank [etc...]
2729     1         0       0                 0
3979     1         1       1                 1
3860 .   1         0       1                 0
[etc...]

I don't often work with strings or characters, so I've hit a wall with figuring out how to manipulate the items.split variable in particular. I have looked at questions like this, but I don't want a frequency count of the words, but rather for the frequency count to be attached to each ID. So I think what I'm struggling with is integrating something like FreqMat with a simple dplyr command that links a frequency command with every ID.

Any help is greatly appreciated. The data is given below.

structure(list(ID = c(2729L, 3979L, 3860L, 2357L, 2278L, 3277L, 
3501L, 3880L, 3207L, 3928L), Items = c("Bicycle", "TV, Mobile Phone, Bicycle, Water Tank", 
"Mobile Phone, Bicycle, Fan", "Mobile Phone, Motorbike", "TV, Mobile Phone, Wagon/Cart, Motorbike, Plow", 
"TV, Mobile Phone, Bicycle, Motorbike, Fan", "Mobile Phone, Bicycle, Water Tank", 
"Tractor, Mobile Phone, Wagon/Cart, Motorbike, Plow", "DVD Player, Bicycle, Plow", 
"TV, Mobile Phone, Bicycle, Fan"), items.split = list("Bicycle", 
    c("TV", "Mobile Phone", "Bicycle", "Water Tank"), c("Mobile Phone", 
    "Bicycle", "Fan"), c("Mobile Phone", "Motorbike"), c("TV", 
    "Mobile Phone", "Wagon/Cart", "Motorbike", "Plow"), c("TV", 
    "Mobile Phone", "Bicycle", "Motorbike", "Fan"), c("Mobile Phone", 
    "Bicycle", "Water Tank"), c("Tractor", "Mobile Phone", "Wagon/Cart", 
    "Motorbike", "Plow"), c("DVD Player", "Bicycle", "Plow"), 
    c("TV", "Mobile Phone", "Bicycle", "Fan"))), row.names = c(NA, 
10L), class = "data.frame")

Ronak Shah · Accepted Answer

You could use cSplit_e from splitstackshape

splitstackshape::cSplit_e(df, "Items", type = "character", fill = 0, drop = TRUE)


#     ID                                        items.split Items_Bicycle Items_DVD Player Items_Fan
#1  2729                                            Bicycle             1                0         0
#2  3979              TV, Mobile Phone, Bicycle, Water Tank             1                0         0
#3  3860                         Mobile Phone, Bicycle, Fan             1                0         1
#4  2357                            Mobile Phone, Motorbike             0                0         0
#5  2278      TV, Mobile Phone, Wagon/Cart, Motorbike, Plow             0                0         0
#6  3277          TV, Mobile Phone, Bicycle, Motorbike, Fan             1                0         1
#7  3501                  Mobile Phone, Bicycle, Water Tank             1                0         0
#8  3880 Tractor, Mobile Phone, Wagon/Cart, Motorbike, Plow             0                0         0
#9  3207                          DVD Player, Bicycle, Plow             1                1         0
#10 3928                     TV, Mobile Phone, Bicycle, Fan             1                0         1

#   Items_Mobile Phone Items_Motorbike Items_Plow Items_Tractor Items_TV Items_Wagon/Cart Items_Water Tank
#1                   0               0          0             0        0                0                0
#2                   1               0          0             0        1                0                1
#3                   1               0          0             0        0                0                0
#4                   1               1          0             0        0                0                0
#5                   1               1          1             0        1                1                0
#6                   1               1          0             0        1                0                0
#7                   1               0          0             0        0                0                1
#8                   1               1          1             1        0                1                0
#9                   0               0          1             0        0                0                0
#10                  1               0          0             0        1                0                0

How would I transform character data into a matrix format in R?

Answers (2)

Related Questions