H_A
H_A

Reputation: 677

ggplot: Plotting the bins on x-axis and the average on y-axis

Suppose that I have a dataframe that looks like this:

data <- data.frame(y = rnorm(10,0,1), x = runif(10,0,1))

What I would like to do is to cut the x values into bins, such as:

data$bins <- cut(data$x,breaks = 4)

Then, I would like to plot (using ggplot) the result in a way that the x-axis is the bins, and the y axis is the mean of data$y data points that fall into the corresponding bin.

Thank you in advance

Upvotes: 4

Views: 13084

Answers (3)

chromestone
chromestone

Reputation: 203

This thread is a bit old but here you go, use stat_summary_bin (it might be in the newer versions).

ggplot(data, mapping=aes(x, y)) +
stat_summary_bin(fun.y = "mean", geom="bar", bins=4 - 1) +
ylab("mean")

Here is a picture

Upvotes: 4

toldo
toldo

Reputation: 416

Since the mean of your y values can be smaller than 0, I recommend a dot plot instead of a bar chart. The dots represent the means. You can use either qplot or the regular ggplot function. The latter is more customizable. In this example, both produce the same output.

library(ggplot2)

set.seed(7)
data <- data.frame(y = rnorm(10,0,1), x = runif(10,0,1))
data$bins <- cut(data$x,breaks = 4, dig.lab = 2)

qplot(bins, y, data = data, stat="summary", fun.y = "mean")

ggplot(data, aes(x = factor(bins), y = y)) + 
  stat_summary(fun.y = mean, geom = "point")

You can also add error bars. In this case, they show the mean +/- 1.96 times the group standard deviation. The group mean and SD can be obtained using tapply.

m <- tapply(data$y, data$bins, mean)
sd <- tapply(data$y, data$bins, sd)
df <- data.frame(mean.y = m, sd = sd, bin = names(m))

ggplot(df, aes(x = bin, y = mean.y, 
               ymin = mean.y - 1.96*sd, 
               ymax = mean.y + 1.96*sd)) + 
  geom_errorbar() + geom_point(size = 3)

enter image description here

Upvotes: 1

maccruiskeen
maccruiskeen

Reputation: 2818

You can use the stat_summary() function.

library(ggplot2)
data <- data.frame(y = rnorm(10,0,1), x = runif(10,0,1))
data$bins <- cut(data$x,breaks = 4)
# Points:
ggplot(data, aes(x = bins, y = y)) +
  stat_summary(fun.y = "mean", geom = "point")

# Histogram bars:
ggplot(data, aes(x = bins, y = y)) +
  stat_summary(fun.y = "mean", geom = "histogram")

Here is the picture of the points:

enter image description here

Upvotes: 5

Related Questions