HariSeldon
HariSeldon

Reputation: 33

ggplot: weighted.mean and stat_summary in a facetted bar plot

I've spent too much time trying to figure out a solution for including weighted.mean (or wtd.mean) into stat_summary and make it work properly. I've looked to several pages trying to tackle the same issue but none had a definitive solution. The main problem is that weighted.mean, once place in stat_summary, fails to find its weights component, which apparently can not be passed down from the ggplot and/or stat_summary aesthetics (believe me, I tried; see examples). Now, I tried various approaches and I've even produced a barplot of weighted means using a ddplyr based function (as suggested in another page) but, beside being a bit cluncky, it does not allow facetting, as it changes the source dataframe.

The following is dataframe built on purpose for this problem.

elements <- c("water","water","water","water","water","water","air","air","air","air","air","air","earth","earth","earth","earth","earth","earth","fire","fire","fire","fire","fire","fire","aether","aether","aether","aether","aether","aether")
shapes <- c("icosahedron","icosahedron","icosahedron","icosahedron","icosahedron","icosahedron","octahedron","octahedron","octahedron","octahedron","octahedron","octahedron","cube","cube","cube","cube","cube","cube","tetrahedron","tetrahedron","tetrahedron","tetrahedron","tetrahedron","tetrahedron","dodecahedron","dodecahedron","dodecahedron","dodecahedron","dodecahedron","dodecahedron")
greek_letter <- c("alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta")
existence <- c("real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","not real","not real","not real","not real","not real","not real")
value <- c(0,0,0,5,7,0,0,1,0,20,3,0,0,2,2,1,8,0,0,8,10,4,2,0,0,0,0,1,1,0)
importance <- c(20,20,20,20,20,20,10,10,10,10,10,10,3,3,3,3,3,3,9,9,9,9,9,9,50,50,50,50,50,50)
platonic <- data.frame(elements,shapes,greek_letter,existence,value,importance)

(A note: I've also added the "shape" column even if I will not use it, just to remind me that I don't want to lose any data in the process but it needs to be available at the end.)

The original setting was a ggplot just with "mean" which includes facetting, as in:

ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)+
  facet_wrap(~elements~existence)

The following is the corresponding code but with "weighted.mean" --> the "w" aestethics is ignored, therefore it assumes all the weights to be equal (by the weighted.mean function definition), which results in a simple mean

ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value, w=platonic$importance), fun.y = "weighted.mean", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

As you can see, it gives a warning Warning: Ignoring unknown aesthetics: w

I tried several ways to make it "see" the weight variable but with no success. In the end I realised that the most promising way would be to redefine the weight.mean function so that its default "w" would be a function of "x". Weighted.mean would still not see any "w" aeshetics but it would compute one as default. To achieve this I tried to nest the native function (weighted.mean) into a generic function, which allows me to change the arguments.

Step by step.

First I tried with "mean" (and it works).

mean.modif <- function(x) {
  mean(x)
}

ggplot(data = platonic)+
      stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean.modif", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

Then with weighted.mean

   weighted.mean.modif <- function(x,w) {
      weighted.mean(x,w)
    }

 ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean.modif", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

but it still doesn't read the "w" (as there's no "w" specified) so it gives back a normal mean.

Then I tried to specify the "w" argument as the weights column in the dataframe

weighted.mean.modif1 <- function(x,w=platonic$importance) {
  weighted.mean(x,w)
}

ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean.modif", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

but it doesn't work. A warnign message says: Computation failed in stat_summary(): 'x' and 'w' must have the same length

Being stuck, I tried to generate a random series of numbers but of the same length as "x" and it surprisingly worked.

weighted.mean.modif2 <- function(x,w=runif(x, min = 0, max = 100)) {
  weighted.mean(x,w)
}
ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "weighted.mean.modif2", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

Obviously, there's a way to trick it but it's no use if I can use only random weights.

I tried to print "x" within the function and then applied it and, while it produces something, even "mean" doesn't work properly anymore.

mean.modif3 <- function(x) {
  mean(x)
  print(x)
}

So, the tricky part that I can not figure out is how to relate properly the "w" default to the "x" so that when the weighted.mean is called within stat_summary, not reading a "w", uses anyway the correct weights.

As I mentioned, there is also a ddply workaround to obtain a weighted mean plot - as it is based on creating a new source dataframe with just the variables already organised and the weighted means, but it does not allow facetting!!!

weighted.fictious <- function(xxxx, yyyy) {
  ddply(xxxx, .(yyyy), function(m) data.frame(fictious_weightedmean=weighted.mean(m$value, m$importance, na.rm = FALSE)))
}

ggplot(data = weighted.fictious(xxxx = platonic, yyyy = platonic$greek_letter), aes(x=yyyy, y=fictious_weightedmean))+
  geom_bar(stat = "identity")

Thanks!

Upvotes: 2

Views: 4548

Answers (1)

jdobres
jdobres

Reputation: 11957

ggplot's built-in summary functions aren't always helpful, and much of the time you're better off computing your summary in a separate step and then plotting that.

Your basic example plot is actually incorrect. It shows "aether" as having means for delta and epsilon of 5 and 7, respectively, which is clearly not the case in the raw data (both these values are 1). But those are the values for the first element in the data frame ("water"). The error arises because ggplot builds its facets in alphabetical order, while at the same time, you are passing in the raw vectors (platonic$value, rather than simply value), which causes things to be plotted in the wrong position. You should always pass the raw, unquoted column name when working with ggplot, so that ggplot can figure out how to work with the associated data.

The correct version of your basic plot would be:

g <- ggplot(data = platonic)+
  stat_summary(mapping = aes(x=greek_letter, y=value), fun.y = "mean", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)+
  facet_wrap(~elements~existence)
print(g)

enter image description here

As for using weighted.mean, as I said above, the only reasonable course of action here is to compute that separately, and plot the result:

platonic.weighted <- platonic %>% 
  group_by(elements, existence, greek_letter) %>% 
  summarize(value = weighted.mean(value, weights = importance))

Since the resulting data frame still has all the column names used in the first plot, you can just swap in the new data set:

g.weighted <- g %+% platonic.weighted

With this example, the two plots are identical, but your mileage may vary.

Your question is a little unclear as to what your expected end result is, but from the example given, I assume you want a weighted mean for each greek letter. We can use summarize to do that easily, or if you really wanted, you could use mutate instead to insert a column of weights without losing the original data:

platonic.weighted <- platonic %>% 
  group_by(greek_letter) %>% 
  mutate(weighted.letter = weighted.mean(value, weights = importance))

Upvotes: 3

Related Questions