gogo88
gogo88

Reputation: 309

Compute multiple Pareto charts on multiple columns

I figured out how to create a Pareto chart but the question is if I have a table where I need to compute and diplay the Pareto chats at ones, I got stuck.

dt <- data.table( YEAR = c("2001", "2000", "2001", "2001","1999", "2000", "2000", "1999", "1999"),
                Col1=sample(0:30, 8, rep=TRUE),
                Col2 =sample(0:45, 8, rep=TRUE),
                col33 =sample(0:60, 8, rep=TRUE)
)

I would for instance what to have for each of the Col variable a chart and preferably displayed together.

My code for computing one single chart is:

Col1_gain <- dt[YEAR=="1999"][order (-Col1)]
Col1_gain<-Col1_gain[, !c("Col2","Col33")]
Col1_gain <- Col1_gain[, `:=` (paret=(cumsum(Col1_gain$Col1))*100/sum(Col1_gain$Col1), cum=(cumsum(Col1_gain$Col1)))]
Col1_gain$ID <- seq.int(nrow(Col1_gain))
Col1_pareto <- ggplot(Col1_gain, aes (x=ID/nrow(Col1_gain)))+ geom_line(aes(y=paret), size=1, color="firebrick")
Col1_pareto

Upvotes: 2

Views: 786

Answers (1)

dc37
dc37

Reputation: 16178

From my understanding of your code for plotting one pareto chart, you are isolating a single year and then plot the cumulative sum expressed as a percentage.

So, if you want to do that for multiple years and multiple columns, you need first to pivot your datatable into a longer format (here I'm using pivot_longer from tidyr but you can do the same thing using melt from data.table).

Then, I will group your data by Year and by the categorical variable "Var" (containing Col1, Col2, ...) and expressed the value as a percentage of the total and the cumulative sum in a percentage. I also create a count which is basically row numbers in order to use it as x axis.

Finally, I used these new variables to make the barchart and the line. I separated "Years" using facet_wrap. Altogether, you can write something like that:

dt %>% pivot_longer(., -YEAR, names_to = "Var", values_to = "Val") %>%
  group_by(YEAR, Var) %>%
  arrange(desc(Val), .by_group = TRUE) %>%
  mutate(CumS = cumsum(Val), Count = row_number()) %>%
  mutate(CumS2 = CumS*100 / sum(Val)) %>%
  mutate(Val_Percent = Val*100/sum(Val)) %>%
  ggplot(aes(x = as.factor(Count), y = Val_Percent, fill = Var))+
  geom_col(position = position_dodge())+
  facet_wrap(.~YEAR)+
  geom_line(aes(y = CumS2, group = Var, color = Var), position = position_dodge(.9))+
  theme(axis.text.x = element_blank(),
        axis.title.x = element_blank(),
        axis.ticks.x = element_blank())

enter image description here

Alternatively, if you want get a pareto chart per "Col" and per "Year", you can do the exact same thing and then use of facet_grid instead of facet_wrap:

dt %>% pivot_longer(., -YEAR, names_to = "Var", values_to = "Val") %>%
  group_by(YEAR, Var) %>%
  arrange(desc(Val), .by_group = TRUE) %>%
  mutate(CumS = cumsum(Val), Count = row_number()) %>%
  mutate(CumS2 = CumS*100 / sum(Val)) %>%
  mutate(Val_Percent = Val*100/sum(Val)) %>%
  ggplot(aes(x = as.factor(Count), y = Val_Percent, fill = Var))+
  geom_col(position = position_dodge())+
  facet_grid(Var~YEAR)+
  geom_line(aes(y = CumS2, group = Var, color = Var), position = position_dodge(.9))+
  theme(axis.text.x = element_blank(),
        axis.title.x = element_blank(),
        axis.ticks.x = element_blank())

enter image description here

Does it answer your question ?

Upvotes: 3

Related Questions