Reputation: 711
I'm doing some analysis on the auto miles per gallon data from UCI website:
https://archive.ics.uci.edu/ml/datasets/Auto+MPG
I factored the first column into either high or low mileage:
mpg01 = I(auto1$mpg >= median(auto1$mpg))
Auto = data.frame(mpg01, auto1[,-1])
head(Auto)
mpg01 cylinders displacement horsepower weight acceleration year origin
1 FALSE 8 307 130 3504 12.0 70 1
2 FALSE 8 350 165 3693 11.5 70 1
3 FALSE 8 318 150 3436 11.0 70 1
4 FALSE 8 304 150 3433 12.0 70 1
5 FALSE 8 302 140 3449 10.5 70 1
6 FALSE 8 429 198 4341 10.0 70 1
Now I want to make boxplot for each of the columns from dataframe, factored by the first column.
vars <- c("cylinders", "displacement", "horsepower", "weight", "acceleration", "year", "origin")
ggplot(Auto) + geom_bar(aes(y=vars, fill=factor(mpg01)))
And I get the error "Aes must be either length 1 or the same as data"
The dimension of "Auto" dataframe is 392x8
I can just use boxplot for each column, but want to know if there's a way to combine them into one. Thanks!
Upvotes: 1
Views: 2368
Reputation: 3369
Updated to explain the error generated: The error is generated because aes(x, y...)
need to be defined to describe how the data frame variables should be mapped into the geoms. In your case, no x
variable has been defined for geom_boxplot
. In order to define the x
variable to be each of the columns of your df, the df needs to be reshaped to long format (e.g. using reshape2::melt
or tidyr::gather
)
Below is the solution that should work which is based on mtcars and not your data. If not, we can troubleshoot it once once you dput(Auto)
for me. The plot you generate should look like the one I attached. First, reshape your data.
library(reshape2)
library(ggplot2)
mtcars_melt <- melt(mtcars)
I can now define x
in aes
. Note: Notice the difference between the 2 cases below when used with facet_wrap
.
# First with no facet_wrap
ggplot(mtcars_melt, aes(x=variable, y=value, fill=variable)) + geom_boxplot()
# Case 1 with facet_wrap
ggplot(mtcars_melt, aes(x=variable, y=value, fill=variable)) + geom_boxplot() + facet_wrap(~variable)
# Case 2 with facet_wrap
ggplot(mtcars_melt, aes(x="", y=value, fill=variable)) + geom_boxplot() + facet_wrap(~variable)
In case 1, I define x=variable
in aes
, but with facet_wrap
it forces each facet to have all x variables present, however if I set x=""
, it allows for each facet to hold only 1 x variable.
Now to allow the y-axis to have independent scales, I can set scales="free_y"
ggplot(mtcars_melt, aes(x="", y=value, fill=variable)) + geom_boxplot() + facet_wrap(~variable, scales="free_y")
Alternatively, I can set scales="free"
to apply to both x and y axis and use it with x=variable
to arrive at a similar solution.
ggplot(mtcars_melt, aes(x=variable, y=value, fill=variable)) + geom_boxplot() + facet_wrap(~variable, scales="free")
Edited: The code below should work for your particular data set:
library(reshape2)
library(ggplot2)
vars <- c("cylinders", "displacement", "horsepower", "weight", "acceleration", "year", "origin")
Auto_melt <- melt(Auto[, vars])
ggplot(Auto_melt, aes(x="", y=value, fill=variable)) + geom_boxplot() + facet_wrap(~variable, scales="free_y")
Edited with code to separate by mpg as requested: Redefine vars by including "mpg01", and melt the data by mpg id. Use mpg01 as aes x value.
Auto <- structure(list(mpg01 = structure(c(2L, 1L, 1L, 1L, 1L), .Label = c("FALSE", "TRUE"), class = "factor"), cylinders = c(8L, 8L, 8L, 8L, 8L), displacement = c(307, 350, 318, 304, 302), horsepower = c(130L, 165L, 150L, 150L, 140L), weight = c(3504L, 3693L, 3436L, 3433L, 3449L), acceleration = c(12, 11.5, 11, 12, 10.5), year = c(70L, 70L, 70L, 70L, 70L), origin = c(1L, 1L, 1L, 1L, 1L)), .Names = c("mpg01", "cylinders", "displacement", "horsepower", "weight", "acceleration", "year", "origin"), row.names = c(NA, 5L), class = "data.frame")
vars <- c("mpg01", "cylinders", "displacement", "horsepower", "weight", "acceleration", "year", "origin")
Auto_melt <- melt(Auto[, vars], id.vars="mpg01")
ggplot(Auto_melt, aes(x=mpg01, y=value, fill=variable)) + geom_boxplot() + facet_wrap(~variable, scales="free_y")
Upvotes: 2
Reputation: 416
I think maybe you should tidy your data, then to draw boxplot. I download the data from the website :
> head(df)
mpg01 cylinders displacement horsepower weight acceleration year origin
1 18 8 307 130 3504 12.0 70 1
2 15 8 350 165 3693 11.5 70 1
3 18 8 318 150 3436 11.0 70 1
4 16 8 304 150 3433 12.0 70 1
5 17 8 302 140 3449 10.5 70 1
6 15 8 429 198 4341 10.0 70 1
Use gather{tidyr} to tidy data.
library("tidyr")
library("dplyr")
library("ggplot2")
tidy_df <- df %>% gather("vars","values",-mpg01)
And tidy_df is:
> head(tidy_df)
mpg01 vars values
1 18 cylinders 8
2 15 cylinders 8
3 18 cylinders 8
4 16 cylinders 8
5 17 cylinders 8
6 15 cylinders 8
Then you can draw boxplot
ggplot(data=tidy_df,aes(vars,values)) + geom_boxplot(aes(fill=vars))
Upvotes: 1