akruug
akruug

Reputation: 31

How to recode a continuous variable into Ranges

I need to recode a continuous variable into categories, usually i use the "cut" Function, but in the cut function i need to specify the breaks. i am looking for a way to have a different set of breaks depending on other categorical variables in my data frame.

the variable in my example is Cost and the "breaks" are in the second table "cost.range", i have a different set of Breaks for each "Region" and each "Category"

Example :

Region    Product     Category Cost
Country A Product 1     CAT A  731
Country B Product 1     CAT A  659
Country C Product 1     CAT A  385
Country D Product 1     CAT A  763
Country A Product 2     CAT A  701
Country B Product 2     CAT A  759
Country C Product 2     CAT A  580
Country D Product 2     CAT A  147
Country A Product 3     CAT B  645
Country B Product 3     CAT B  657
Country C Product 3     CAT B  424


Region       Category  Cost.Range      Range
Country A      CAT A         10          R1
Country A      CAT A         50          R2
Country A      CAT A        200          R3
Country A      CAT A       1000          R4
Country A      CAT B         20          R1
Country A      CAT B        100          R2
Country A      CAT B        400          R3
Country A      CAT B       1500          R4

code to generate the example :

Region <- c("Country A","Country B","Country C","Country D","Country A","Country B","Country C","Country D","Country A","Country B","Country C","Country D","Country A","Country B","Country C","Country D")
Product <- c("Product 1","Product 1","Product 1","Product 1","Product 2","Product 2","Product 2","Product 2","Product 3","Product 3","Product 3","Product 3","Product 4","Product 4","Product 4","Product 4")
Category <- c("CAT A","CAT A","CAT A","CAT A","CAT A","CAT A","CAT A","CAT A","CAT B","CAT B","CAT B","CAT B","CAT B","CAT B","CAT B","CAT B")
Cost <- c(731,659,385,763,701,759,580,147,645,657,424,34,850,463,160,550)

Table1 <- data.frame(Region, Product, Category, Cost)

Region <- c("Country A","Country A","Country A","Country A","Country A","Country A","Country A","Country A")
Category <- c("CAT A","CAT A","CAT A","CAT A","CAT B","CAT B","CAT B","CAT B")
Cost.range <- c(10,50,200,1000,20,100,400,1500)
Range <- c("R1","R1","R3","R4","R1","R2","R3","R4")

Table2 <- data.frame(Region, Category, Cost.range, Range)

Upvotes: 3

Views: 857

Answers (1)

NGaffney
NGaffney

Reputation: 1532

This is not the most elegant solution (and I'd be interested to see a better method) but it should achieve the result you're looking for.

The select() and distinct() functions from the dplyr package find the possible combinations of Region and Category. These combinations are used to subset the two tables and apply the cut() function to each subset.

library('dplyr')
library('data.table')

dt1 <- data.table(Table1)
dt2 <- data.table(Table2)

t2d <- Table2 %>% select(Region, Category) %>% distinct

for(i in 1:nrow(t2d)){
  dt2_range_subset <- dt2[Region == as.character(t2d$Region[i]) 
                          & Category == t2d$Category[i], Cost.range]
  dt1[Region == as.character(t2d$Region[i]) & Category == t2d$Category[i],
      Cost_factor := cut(as.matrix(Cost), dt2_range_subset)]
}

Upvotes: 1

Related Questions