Reputation: 199
I'm doing some basic data analysis on this dataset: https://www.kaggle.com/murderaccountability/homicide-reports
I'm generating a basic barplot using the State names as the x-axis values, and the y-axis values is the percentage of nationwide homicide occurrences (number of entries in the data set divided by the total number of entries)
barplot(prop.table(table(homicideData.raw$State)),
main = "Nationwide Homicide % per State",
ylab = "Accounting % of Nation-wide Homicides",
las=2)
This is very messy, is there a way of grouping perhaps 5 states together as an x-axis label, without changing the bars?
Let's say the following for example:
x-axis labels: "Alaska - California", "Colorado - Florida", ... (and so on). Each label should then have 5 bars above it.
Upvotes: 1
Views: 261
Reputation: 7163
Here's a solution with ggplot. It's not the simplest as it involves some data-manipulation.
(1) read in the data-set and extract the homicide count/proportion by state:
df <- read.csv("homicide.csv")
library(dplyr)
freq <- with(df, table(State)) %>% data.frame
freq <- freq %>% mutate(prop = Freq/sum(Freq))
(2) find first and last element of each group of 5 states:
hd <- seq(1, nrow(freq), by=5) %>% ceiling
hd <- hd[-length(hd)]
td <- c((hd-1)[-1], nrow(freq))
(3) custom function to make the custom label for each group (e.g. Alb - Clf) and calculate length of each group
abbrevFn <- function(head, tail, state, ...) paste(abbreviate(state[c(head,tail)], ...), collapse = " - ")
intervalFn <- function(head, tail) diff(c(head, tail)) + 1
(4) group the states by replicating custom label by the length for each group
freq$group <- lapply(1:length(hd), function(x) rep(abbrevFn(hd[x], td[x], freq$State, min=3), intervalFn(hd[x], td[x]))) %>% unlist
(5) plot geom_bar based on the customised group, and dodge position by state:
xint <- c((1:length(hd) - .5), (1:length(hd) + .5)) %>% unique
library(ggplot2)
ggplot(freq, aes(group, prop, fill=State)) +
geom_bar(stat="identity", position="dodge", width=1) +
scale_fill_manual(values=rep("gray80", nrow(freq))) +
ylab("Accounting % of Nation-wide Homicides") +
xlab("States") +
geom_vline(xintercept=xint, linetype="dotted") +
guides(fill=FALSE) +
theme_bw()
Upvotes: 2