Reputation: 435
I'd like to use a violin plot to visualise the number of archaeological artefacts by site (A and B) and by century with data in the following format (years are Before Present):
Year SiteA SiteB
22400 356 182
22500 234 124
22600 144 231
22700 12 0
...
24800 112 32
There are some 6000 artefacts in total. In ggplot2, it would seem as if the preferred data entry format is of one line per observation (artefact) for a violin plot:
Site Year
A 22400
A 22400
... (356 times)
A 22400
B 22400
B 22400
... (182 times)
A 22500
A 22500
... (234 times)
A 22500
... ... ... (~5000 lines)
B 24800
B 24800
... (32 times)
B 24800
Is there an effective way of converting summary dataframe (1st grey box) into an observation-by-observation dataframe (2nd grey box) for use in a violin plot?
Alternatively, is there a way of making violin plots from data formatted as in the first grey box?
Update:
With the answer provided by eipi10, if either Site A or B has zero artefacts (as in the updated example above for the year 22,700), I get the following error:
Error in data.frame(Year = rep(dat$Year[i], dat$value[i]), Site = dat$key[i]) :
arguments imply differing number of rows: 0, 1
The plot would look like this:
Upvotes: 0
Views: 229
Reputation: 93811
How about this:
library(tidyverse)
dat = read.table(text="Year SiteA SiteB
22400 356 182
22500 234 124
22600 144 231
24800 112 32", header=TRUE, stringsAsFactors=FALSE)
dat = gather(dat, key, value, -Year)
dat.long = data.frame(Year = rep(dat$Year, dat$value), Site=rep(dat$key, dat$value))
ggplot(dat.long, aes(Site, Year)) +
geom_violin()
Upvotes: 1