Pertinax
Pertinax

Reputation: 435

Violin plot from summary data

I'd like to use a violin plot to visualise the number of archaeological artefacts by site (A and B) and by century with data in the following format (years are Before Present):

Year SiteA SiteB
22400 356 182
22500 234 124
22600 144 231
22700 12  0
...
24800 112  32

There are some 6000 artefacts in total. In ggplot2, it would seem as if the preferred data entry format is of one line per observation (artefact) for a violin plot:

Site Year
A    22400
A    22400
... (356 times)
A    22400
B    22400
B    22400
... (182 times)
A    22500
A    22500
... (234 times)
A    22500
... ... ... (~5000 lines)
B    24800
B    24800
... (32 times)
B    24800

Is there an effective way of converting summary dataframe (1st grey box) into an observation-by-observation dataframe (2nd grey box) for use in a violin plot?

Alternatively, is there a way of making violin plots from data formatted as in the first grey box?

Update:

With the answer provided by eipi10, if either Site A or B has zero artefacts (as in the updated example above for the year 22,700), I get the following error:

Error in data.frame(Year = rep(dat$Year[i], dat$value[i]), Site = dat$key[i]) : 
  arguments imply differing number of rows: 0, 1

The plot would look like this:

enter image description here

Upvotes: 0

Views: 229

Answers (1)

eipi10
eipi10

Reputation: 93811

How about this:

library(tidyverse)

dat = read.table(text="Year SiteA SiteB
22400 356 182
                 22500 234 124
                 22600 144 231
                 24800 112  32", header=TRUE, stringsAsFactors=FALSE)

dat = gather(dat, key, value, -Year)

dat.long = data.frame(Year = rep(dat$Year, dat$value), Site=rep(dat$key, dat$value))

ggplot(dat.long, aes(Site, Year)) +
  geom_violin()

Upvotes: 1

Related Questions