Reputation: 2267
I have a df like this:
library(dplyr)
library(data.table)
library(ggplot2)
library(scales)
Events <- c("A_1","A_2","A_3","B_1","B_2","B_3","C_1","C_2","C_3","D_1","D_2","D_3","E_1","E_2","E_3")
Percentage_Occur <- c(4.6,6.5,1.3,3.4,6.6,5.5,7.8,2.2,2.4,2.1,6.6,5.9,4.9,11.1,4.3)
df1 <- data.frame(Events,Percentage_Occur)
I am trying to combines these sub events into individual category to determine the top 3 events and hence I do it like this:
df2 <- data.frame(df1)
df2$Events <- gsub("*_.*", "\\1", df2$Events)
df2 <- df2 %>% group_by(Events) %>% summarise_each(funs(sum(., na.rm = T)))
df2 <- df2[ order(-df2[,2]), ]
ggplot(df2, aes(reorder(Events,Percentage_Occur), Percentage_Occur)) +
geom_bar(stat = "identity") + coord_flip() +
xlab("Events")
Once I am able to visualize the top 3 events (In this case they are E,B,D), I would like to plot the sub events of these top 3events in a separate plot like this.
I manually did this by extracting the sub events for the top events. Since this is a sample dataset, I was able to do it. I would like to apply the logic to a bigger dataset that I have.
All I want is to determine the top events (which I am able to do), and then somehow programatically plot the corresponding sub events for those top events without manually looking at the dataframe and extracting it. One of the reason is because these events change over time and would like to run this type of logic every 2 hours to see any new events that get to the top. Kindly please provide your inputs on this and help me in moving forward.
Note: I would also like to have better colors.
Upvotes: 2
Views: 124
Reputation: 83225
In order to automatically extract the top 3 events, you can process your data with:
library(data.table)
library(splitstackshape)
dt <- cSplit(df1, "Events", sep="_", drop=FALSE)[, tot := sum(Percentage_Occur), Events_1
][order(-tot,Events_1,-Percentage_Occur)
][, top := rleid(tot)
][top <= 3]
And then with the help of scale_fill_manual
and reorder()
, you will get a plot with nicer colors & also ordered bars within the groups:
# create a vector for the labels of the events in the legend
# this is needed to get the legend in the right order
lbls <- dt$Events
# create the plot
ggplot(dt, aes(x=reorder(Events_1,-tot), y=Percentage_Occur, fill=reorder(Events,-Percentage_Occur), color="black")) +
geom_bar(stat="identity", position = position_dodge(width=0.8), width=0.7) +
geom_text(aes(y=1, label=Events), position = position_dodge(width=0.8), angle=90) +
labs(title="Occurence by Event", x="Events", y="Percentage Occurance") +
scale_color_manual(values = "black") +
scale_fill_manual("Events", breaks = lbls,
values = c('#d53e4f','#3288bd','#fee08b','#ffffbf','#66c2a5','#f46d43','#fdae61','#abdda4','#e6f598')) +
guides(color=FALSE) +
theme_minimal()
which will result in:
You can adapt the color-values in scale_fill_manual
to your own preferences.
The data preparation can also be done with dplyr
/tidyr
(although you will still need data.table
for rleid
function):
library(dplyr)
library(tidyr)
library(data.table)
df1 <- df1 %>% separate(Events, c("Event","Subevent"), sep="_", remove=FALSE) %>%
ungroup() %>%
group_by(Event) %>%
mutate(tot = sum(Percentage_Occur)) %>%
ungroup() %>%
arrange(-tot,-Percentage_Occur) %>%
mutate(top = rleid(tot)) %>% # for this part you need data.table
filter(top <= 3)
Upvotes: 2