Reputation: 147
I want to add a new column SubCategory
with values filled randomly based on value of Category
column. Here's the details:
Sub_Hair = c("Shampoo", "Conditioner", "Gel", "HairOil", "Dye")
Sub_Beauty = c("Face", "Eye", "Lips")
Sub_Nail= c("NailPolish", "NailPolishRemover", "NailArtKit", "ManiPadiKit")
Sub_Others = c("Electric", "NonElectric")
> product_data_1[1:10, c("Pcode", "Category", "MRP")]
Pcode Category MRP
1 16156L Beauty $8.88
2 16162M Others $21.27
3 16168M Others $2.98
4 16169E Nail $26.64
5 16207A Hair $6.38
6 17012B Beauty $33.03
7 17012C Beauty $20.58
8 17012F Beauty $36.29
9 17091A Nail $20.55
10 17107D Nail $28.20
I'm trying the below code. However, the rows are getting updated with just one subcategory for each category. For example, all rows with "Beauty" category, the subcategory is "Eye" instead of values randomly selected from "Face, Eye and Lips". Here's the code and output:
product_data_1 = within(product_data_1, SubCategory[Category == "Beauty"] <- sample(Sub_Beauty, 1))
product_data_1 = within(product_data_1, SubCategory[Category == "Hair"] <- sample(Sub_Hair, 1))
product_data_1 = within(product_data_1, SubCategory[Category == "Nail"] <- sample(Sub_Nail, 1))
product_data_1 = within(product_data_1, SubCategory[Category == "Others"] <- sample(Sub_Others, 1))
> product_data_1[1:10, c("Pcode", "Category", "MRP", "SubCategory")]
Pcode Category MRP SubCategory
1 16156L Beauty $8.88 Eye
2 16162M Others $21.27 Electric
3 16168M Others $2.98 Electric
4 16169E Nail $26.64 NailPolish
5 16207A Hair $6.38 Gel
6 17012B Beauty $33.03 Eye
7 17012C Beauty $20.58 Eye
8 17012F Beauty $36.29 Eye
9 17091A Nail $20.55 NailPolish
10 17107D Nail $28.20 NailPolish
Upvotes: 0
Views: 50
Reputation:
Put your subcategory values in a list like subcat_list <- list(Hair = Hair, Beauty = Beauty, Nail = Nail, Others = Others)
. You can then use product_data_1$Category
to slice subcat_list
and sapply
to call sample
on each element of the resultant list of vectors:
set.seed(323)
product_data_1$SubCategory <- sapply(subcat_list[product_data_1$Category], sample, 1)
You can also try a slightly different approach with dplyr
+ purrr
:
library(tidyverse)
product_data_1 %>%
mutate(SubCategory = map_chr(Category, ~ sample(subcat_list[[.]], 1)))
Pcode Category MRP SubCategory
1 16156L Beauty $8.88 Eye
2 16162M Others $21.27 Electric
3 16168M Others $2.98 Electric
4 16169E Nail $26.64 NailPolish
5 16207A Hair $6.38 Gel
6 17012B Beauty $33.03 Eye
7 17012C Beauty $20.58 Lips
8 17012F Beauty $36.29 Face
9 17091A Nail $20.55 ManiPadiKit
10 17107D Nail $28.20 NailArtKit
Upvotes: 1
Reputation: 76460
Here is a base R solution. It uses the split/apply/combine strategy explained in this JSS article by Hadley Wickham.
I will put the Sub_*
vectors in a list, Sub_list
. Be careful, split
will order the result by Category
so the list Sub_list
must also have the vectors in order.
Sub_list <- list(Sub_Beauty, Sub_Hair, Sub_Nail, Sub_Others)
sp <- split(product_data_1, product_data_1$Category)
set.seed(1234)
sp <- lapply(seq_along(sp), function(i){
sp[[i]]$SubCategory <- sample(Sub_list[[i]], nrow(sp[[i]]), replace = TRUE)
sp[[i]]
})
result <- do.call(rbind, sp)
result <- result[order(as.integer(row.names(result))), ]
result
# Pcode Category MRP SubCategory
#1 16156L Beauty $8.88 Eye
#2 16162M Others $21.27 NonElectric
#3 16168M Others $2.98 NonElectric
#4 16169E Nail $26.64 NailPolish
#5 16207A Hair $6.38 Shampoo
#6 17012B Beauty $33.03 Eye
#7 17012C Beauty $20.58 Face
#8 17012F Beauty $36.29 Lips
#9 17091A Nail $20.55 NailPolishRemover
#10 17107D Nail $28.20 ManiPadiKit
Final clean up.
rm(Sub_list)
Data
product_data_1 <- read.table(text = "
Pcode Category MRP
1 16156L Beauty $8.88
2 16162M Others $21.27
3 16168M Others $2.98
4 16169E Nail $26.64
5 16207A Hair $6.38
6 17012B Beauty $33.03
7 17012C Beauty $20.58
8 17012F Beauty $36.29
9 17091A Nail $20.55
10 17107D Nail $28.20
", header = TRUE)
Upvotes: 1