Reputation: 351
I have the following dataframe that I wish to transform. Currently, it looks like this:
ID
Items items.split
1 2729 Bicycle Bicycle
2 3979 TV, Mobile Phone, Bicycle, Water Tank c("TV", "Mobile Phone", "Bicycle", "Water Tank")
3 3860 Mobile Phone, Bicycle, Fan c("Mobile Phone", "Bicycle", "Fan")
4 2357 Mobile Phone, Motorbike c("Mobile Phone", "Motorbike")
5 2278 TV, Mobile Phone, Wagon/Cart, Motorbike, Plow c("TV", "Mobile Phone", "Wagon/Cart", "Motorbike", "Plow")
6 3277 TV, Mobile Phone, Bicycle, Motorbike, Fan c("TV", "Mobile Phone", "Bicycle", "Motorbike", "Fan")
7 3501 Mobile Phone, Bicycle, Water Tank c("Mobile Phone", "Bicycle", "Water Tank")
8 3880 Tractor, Mobile Phone, Wagon/Cart, Motorbike, Plow c("Tractor", "Mobile Phone", "Wagon/Cart", "Motorbike", "Plow")
9 3207 DVD Player, Bicycle, Plow c("DVD Player", "Bicycle", "Plow")
10 3928 TV, Mobile Phone, Bicycle, Fan c("TV", "Mobile Phone", "Bicycle", "Fan")
I want to transform the dataframe above into the following format:
Bicycle TV Mobile Phone Water Tank [etc...]
2729 1 0 0 0
3979 1 1 1 1
3860 . 1 0 1 0
[etc...]
I don't often work with strings or characters, so I've hit a wall with figuring out how to manipulate the items.split
variable in particular. I have looked at questions like this, but I don't want a frequency count of the words, but rather for the frequency count to be attached to each ID. So I think what I'm struggling with is integrating something like FreqMat
with a simple dplyr
command that links a frequency command with every ID.
Any help is greatly appreciated. The data is given below.
structure(list(ID = c(2729L, 3979L, 3860L, 2357L, 2278L, 3277L,
3501L, 3880L, 3207L, 3928L), Items = c("Bicycle", "TV, Mobile Phone, Bicycle, Water Tank",
"Mobile Phone, Bicycle, Fan", "Mobile Phone, Motorbike", "TV, Mobile Phone, Wagon/Cart, Motorbike, Plow",
"TV, Mobile Phone, Bicycle, Motorbike, Fan", "Mobile Phone, Bicycle, Water Tank",
"Tractor, Mobile Phone, Wagon/Cart, Motorbike, Plow", "DVD Player, Bicycle, Plow",
"TV, Mobile Phone, Bicycle, Fan"), items.split = list("Bicycle",
c("TV", "Mobile Phone", "Bicycle", "Water Tank"), c("Mobile Phone",
"Bicycle", "Fan"), c("Mobile Phone", "Motorbike"), c("TV",
"Mobile Phone", "Wagon/Cart", "Motorbike", "Plow"), c("TV",
"Mobile Phone", "Bicycle", "Motorbike", "Fan"), c("Mobile Phone",
"Bicycle", "Water Tank"), c("Tractor", "Mobile Phone", "Wagon/Cart",
"Motorbike", "Plow"), c("DVD Player", "Bicycle", "Plow"),
c("TV", "Mobile Phone", "Bicycle", "Fan"))), row.names = c(NA,
10L), class = "data.frame")
Upvotes: 1
Views: 36
Reputation: 388817
You could use cSplit_e
from splitstackshape
splitstackshape::cSplit_e(df, "Items", type = "character", fill = 0, drop = TRUE)
# ID items.split Items_Bicycle Items_DVD Player Items_Fan
#1 2729 Bicycle 1 0 0
#2 3979 TV, Mobile Phone, Bicycle, Water Tank 1 0 0
#3 3860 Mobile Phone, Bicycle, Fan 1 0 1
#4 2357 Mobile Phone, Motorbike 0 0 0
#5 2278 TV, Mobile Phone, Wagon/Cart, Motorbike, Plow 0 0 0
#6 3277 TV, Mobile Phone, Bicycle, Motorbike, Fan 1 0 1
#7 3501 Mobile Phone, Bicycle, Water Tank 1 0 0
#8 3880 Tractor, Mobile Phone, Wagon/Cart, Motorbike, Plow 0 0 0
#9 3207 DVD Player, Bicycle, Plow 1 1 0
#10 3928 TV, Mobile Phone, Bicycle, Fan 1 0 1
# Items_Mobile Phone Items_Motorbike Items_Plow Items_Tractor Items_TV Items_Wagon/Cart Items_Water Tank
#1 0 0 0 0 0 0 0
#2 1 0 0 0 1 0 1
#3 1 0 0 0 0 0 0
#4 1 1 0 0 0 0 0
#5 1 1 1 0 1 1 0
#6 1 1 0 0 1 0 0
#7 1 0 0 0 0 0 1
#8 1 1 1 1 0 1 0
#9 0 0 1 0 0 0 0
#10 1 0 0 0 1 0 0
Upvotes: 1
Reputation: 886948
We can use table
after expanding the list
column
library(dplyr)
library(tidyr)
df1 %>%
select(-Items) %>%
unnest(items.split) %>%
table
Or in base R
after stack
ing into a two column data.frame
table(stack(setNames(df1$items.split, df1$ID))[2:1])
# values
#ind Bicycle DVD Player Fan Mobile Phone Motorbike Plow Tractor TV Wagon/Cart Water Tank
# 2729 1 0 0 0 0 0 0 0 0 0
# 3979 1 0 0 1 0 0 0 1 0 1
# 3860 1 0 1 1 0 0 0 0 0 0
# 2357 0 0 0 1 1 0 0 0 0 0
# 2278 0 0 0 1 1 1 0 1 1 0
# 3277 1 0 1 1 1 0 0 1 0 0
# 3501 1 0 0 1 0 0 0 0 0 1
# 3880 0 0 0 1 1 1 1 0 1 0
# 3207 1 1 0 0 0 1 0 0 0 0
# 3928 1 0 1 1 0 0 0 1 0 0
Upvotes: 0