Reputation: 43
I am trying to reproduce these three simple histograms created in excel in R, in order to have something slightly more appealing to the eye. I have no doubt this is simple, but am out of practice with R.
[
I have found different tutorials for producing basic histograms, but have yet to find something that will produce three columns (representing years) for each of the distance bins, and then three separate graphs for each of the data groups (A, B, C).
I believe the first thing I need to do is restructure my data, and I guess this is the step I am unsure about.
Thanks in advance.
Upvotes: 0
Views: 2873
Reputation: 1043
Yes, you will have to restructure your data. You can do it in R as shown by @stefan or if it's challenging you can do it in excel itself. Tidy data is easy to plot and analyze (see section 12.1 for tidy data and section 3.7, 3.8 for visualization). Tidy data will look something like consisting of four columns - Distance, Value, Value_year, Value_group.
As an example, I stored some data as a tab-delimited file (testdata.txt) and read in using tidyverse's read_delim function. Following is the example code:
library(tidyverse)
foo <- read_delim("testdata.txt", delim = "\t")
foo %>% mutate(Val_year = factor(Val_year, levels=c("2015","2016","2017"))) %>%
ggplot() + geom_bar(aes(x=Dist, y=Val, fill = Val_year), stat = "identity", position = "dodge") + facet_grid(.~Val_grp)
Upvotes: 2
Reputation: 123768
Using some random example data the following code is a tidyverse
solution which gives you a bar or column chart (as your data is already binned this is the way to go) mimicing your excel chart for one dataset. As you already guessed the tricky part is getting your data into R (to this end: have a look at the readxl
package) and to rearrange it for plotting (this is done via pivot_longer
from the tidyr
package and mutate
from dplyr
both of which are part of the tidyverse
. As for the plotting part I use ggplot2
which is - you might have guessed it (; - also part of the tidyverse
.
# Example data set
set.seed(42)
df <- data.frame(
distance = paste0(seq(0, 3.5, by = 0.5), "-", seq(0.5, 4, by = 0.5)),
`2015` = round(runif(8) * 8, 0),
`2016` = round(runif(8) * 8, 0),
`2017` = round(runif(8) * 8, 0)
)
df
#> distance X2015 X2016 X2017
#> 1 0-0.5 7 5 8
#> 2 0.5-1 7 6 1
#> 3 1-1.5 2 4 4
#> 4 1.5-2 7 6 4
#> 5 2-2.5 5 7 7
#> 6 2.5-3 4 2 1
#> 7 3-3.5 6 4 8
#> 8 3.5-4 1 8 8
library(tidyverse)
df %>%
# Convert the dataset to long format
pivot_longer(-distance, names_to = "Year", values_to = "Value") %>%
# format the dates, get rid of leading Xs
mutate(Year = gsub("^X", "", Year)) %>%
ggplot(aes(distance, Value, fill = Year)) +
# Column chart. Add some width between columns
geom_col(position = position_dodge2(2)) +
scale_y_continuous(expand = expansion(mult = c(0, .05))) +
scale_fill_manual(values = c("blue", "orange", "grey")) +
# Get rid of axis and legend labels
labs(y = "", x = "", fill = "") +
theme_bw() +
theme(legend.position = "bottom")
Created on 2020-04-05 by the reprex package (v0.3.0)
Upvotes: 1