user5613688
user5613688

Reputation: 43

Histogram with multiple bins and groups

I am trying to reproduce these three simple histograms created in excel in R, in order to have something slightly more appealing to the eye. I have no doubt this is simple, but am out of practice with R.

> [![data][1]][1]

[histogram[1]

I have found different tutorials for producing basic histograms, but have yet to find something that will produce three columns (representing years) for each of the distance bins, and then three separate graphs for each of the data groups (A, B, C).
I believe the first thing I need to do is restructure my data, and I guess this is the step I am unsure about.

Thanks in advance.

Upvotes: 0

Views: 2873

Answers (2)

smandape
smandape

Reputation: 1043

Yes, you will have to restructure your data. You can do it in R as shown by @stefan or if it's challenging you can do it in excel itself. Tidy data is easy to plot and analyze (see section 12.1 for tidy data and section 3.7, 3.8 for visualization). Tidy data will look something like consisting of four columns - Distance, Value, Value_year, Value_group.

enter image description here

As an example, I stored some data as a tab-delimited file (testdata.txt) and read in using tidyverse's read_delim function. Following is the example code:

library(tidyverse)
foo <- read_delim("testdata.txt", delim = "\t")
foo %>% mutate(Val_year = factor(Val_year, levels=c("2015","2016","2017"))) %>% 
ggplot() + geom_bar(aes(x=Dist, y=Val, fill = Val_year), stat = "identity", position = "dodge") + facet_grid(.~Val_grp)

enter image description here

Upvotes: 2

stefan
stefan

Reputation: 123768

Using some random example data the following code is a tidyverse solution which gives you a bar or column chart (as your data is already binned this is the way to go) mimicing your excel chart for one dataset. As you already guessed the tricky part is getting your data into R (to this end: have a look at the readxl package) and to rearrange it for plotting (this is done via pivot_longer from the tidyr package and mutate from dplyr both of which are part of the tidyverse. As for the plotting part I use ggplot2 which is - you might have guessed it (; - also part of the tidyverse.

# Example data set
set.seed(42)

df <- data.frame(
  distance = paste0(seq(0, 3.5, by = 0.5), "-", seq(0.5, 4, by = 0.5)),
  `2015` = round(runif(8) * 8, 0),
  `2016` = round(runif(8) * 8, 0),
  `2017` = round(runif(8) * 8, 0)
)
df
#>   distance X2015 X2016 X2017
#> 1    0-0.5     7     5     8
#> 2    0.5-1     7     6     1
#> 3    1-1.5     2     4     4
#> 4    1.5-2     7     6     4
#> 5    2-2.5     5     7     7
#> 6    2.5-3     4     2     1
#> 7    3-3.5     6     4     8
#> 8    3.5-4     1     8     8

library(tidyverse)

df %>% 
  # Convert the dataset to long format
  pivot_longer(-distance, names_to = "Year", values_to = "Value") %>% 
  # format the dates, get rid of leading Xs
  mutate(Year = gsub("^X", "", Year)) %>% 
  ggplot(aes(distance, Value, fill = Year)) + 
  # Column chart. Add some width between columns
  geom_col(position = position_dodge2(2)) +
  scale_y_continuous(expand = expansion(mult = c(0, .05))) +
  scale_fill_manual(values = c("blue", "orange", "grey")) +
  # Get rid of axis and legend labels
  labs(y = "", x = "", fill = "") +
  theme_bw() +
  theme(legend.position = "bottom")

Created on 2020-04-05 by the reprex package (v0.3.0)

Upvotes: 1

Related Questions