Sharath
Sharath

Reputation: 2267

Programmatically plot sub events of top events using ggplot: R

I have a df like this:

library(dplyr)
library(data.table)
library(ggplot2)
library(scales)

Events <- c("A_1","A_2","A_3","B_1","B_2","B_3","C_1","C_2","C_3","D_1","D_2","D_3","E_1","E_2","E_3")
Percentage_Occur <- c(4.6,6.5,1.3,3.4,6.6,5.5,7.8,2.2,2.4,2.1,6.6,5.9,4.9,11.1,4.3)
df1 <- data.frame(Events,Percentage_Occur)

I am trying to combines these sub events into individual category to determine the top 3 events and hence I do it like this:

df2 <- data.frame(df1)
df2$Events <- gsub("*_.*", "\\1", df2$Events)
df2 <- df2 %>% group_by(Events) %>% summarise_each(funs(sum(., na.rm = T)))
df2 <- df2[ order(-df2[,2]), ]

ggplot(df2, aes(reorder(Events,Percentage_Occur), Percentage_Occur)) + 
  geom_bar(stat = "identity") + coord_flip() +
  xlab("Events")

enter image description here

Once I am able to visualize the top 3 events (In this case they are E,B,D), I would like to plot the sub events of these top 3events in a separate plot like this.

enter image description here

I manually did this by extracting the sub events for the top events. Since this is a sample dataset, I was able to do it. I would like to apply the logic to a bigger dataset that I have.

All I want is to determine the top events (which I am able to do), and then somehow programatically plot the corresponding sub events for those top events without manually looking at the dataframe and extracting it. One of the reason is because these events change over time and would like to run this type of logic every 2 hours to see any new events that get to the top. Kindly please provide your inputs on this and help me in moving forward.

Note: I would also like to have better colors.

Upvotes: 2

Views: 124

Answers (1)

Jaap
Jaap

Reputation: 83225

In order to automatically extract the top 3 events, you can process your data with:

library(data.table)
library(splitstackshape)

dt <- cSplit(df1, "Events", sep="_", drop=FALSE)[, tot := sum(Percentage_Occur), Events_1
                                                 ][order(-tot,Events_1,-Percentage_Occur)
                                                   ][, top := rleid(tot)
                                                     ][top <= 3]

And then with the help of scale_fill_manual and reorder(), you will get a plot with nicer colors & also ordered bars within the groups:

# create a vector for the labels of the events in the legend
# this is needed to get the legend in the right order
lbls <- dt$Events

# create the plot
ggplot(dt, aes(x=reorder(Events_1,-tot), y=Percentage_Occur, fill=reorder(Events,-Percentage_Occur), color="black")) +
  geom_bar(stat="identity", position = position_dodge(width=0.8), width=0.7) +
  geom_text(aes(y=1, label=Events), position = position_dodge(width=0.8), angle=90) +
  labs(title="Occurence by Event", x="Events", y="Percentage Occurance") +
  scale_color_manual(values = "black") +
  scale_fill_manual("Events", breaks = lbls,
                    values = c('#d53e4f','#3288bd','#fee08b','#ffffbf','#66c2a5','#f46d43','#fdae61','#abdda4','#e6f598')) +
  guides(color=FALSE) +
  theme_minimal()

which will result in:

enter image description here

You can adapt the color-values in scale_fill_manual to your own preferences.


The data preparation can also be done with dplyr/tidyr (although you will still need data.table for rleid function):

library(dplyr)
library(tidyr)
library(data.table)

df1 <- df1 %>% separate(Events, c("Event","Subevent"), sep="_", remove=FALSE) %>%
  ungroup() %>%
  group_by(Event) %>%
  mutate(tot = sum(Percentage_Occur)) %>%
  ungroup() %>%
  arrange(-tot,-Percentage_Occur) %>%
  mutate(top = rleid(tot)) %>%      # for this part you need data.table
  filter(top <= 3)

Upvotes: 2

Related Questions