ajax2000
ajax2000

Reputation: 711

boxplot in R, aesthetics must be either length 1 or the same length as data

I'm doing some analysis on the auto miles per gallon data from UCI website:

https://archive.ics.uci.edu/ml/datasets/Auto+MPG

I factored the first column into either high or low mileage:

mpg01 = I(auto1$mpg >= median(auto1$mpg))
Auto = data.frame(mpg01, auto1[,-1])
head(Auto)

    mpg01 cylinders displacement horsepower weight acceleration year origin
 1 FALSE         8          307        130   3504         12.0   70      1
 2 FALSE         8          350        165   3693         11.5   70      1
 3 FALSE         8          318        150   3436         11.0   70      1
 4 FALSE         8          304        150   3433         12.0   70      1
 5 FALSE         8          302        140   3449         10.5   70      1
 6 FALSE         8          429        198   4341         10.0   70      1

Now I want to make boxplot for each of the columns from dataframe, factored by the first column.

vars <- c("cylinders", "displacement", "horsepower", "weight", "acceleration", "year", "origin")
ggplot(Auto) + geom_bar(aes(y=vars, fill=factor(mpg01)))

And I get the error "Aes must be either length 1 or the same as data"

The dimension of "Auto" dataframe is 392x8

I can just use boxplot for each column, but want to know if there's a way to combine them into one. Thanks!

Upvotes: 1

Views: 2368

Answers (2)

Djork
Djork

Reputation: 3369

Updated to explain the error generated: The error is generated because aes(x, y...) need to be defined to describe how the data frame variables should be mapped into the geoms. In your case, no x variable has been defined for geom_boxplot. In order to define the x variable to be each of the columns of your df, the df needs to be reshaped to long format (e.g. using reshape2::melt or tidyr::gather)

Below is the solution that should work which is based on mtcars and not your data. If not, we can troubleshoot it once once you dput(Auto) for me. The plot you generate should look like the one I attached. First, reshape your data.

library(reshape2)
library(ggplot2)
mtcars_melt <- melt(mtcars)

I can now define x in aes. Note: Notice the difference between the 2 cases below when used with facet_wrap.

# First with no facet_wrap
ggplot(mtcars_melt, aes(x=variable, y=value, fill=variable)) + geom_boxplot()
# Case 1 with facet_wrap
ggplot(mtcars_melt, aes(x=variable, y=value, fill=variable)) + geom_boxplot() + facet_wrap(~variable)
# Case 2 with facet_wrap
ggplot(mtcars_melt, aes(x="", y=value, fill=variable)) + geom_boxplot() + facet_wrap(~variable)

In case 1, I define x=variable in aes, but with facet_wrap it forces each facet to have all x variables present, however if I set x="", it allows for each facet to hold only 1 x variable.

Now to allow the y-axis to have independent scales, I can set scales="free_y"

ggplot(mtcars_melt, aes(x="", y=value, fill=variable)) + geom_boxplot() + facet_wrap(~variable, scales="free_y")

Alternatively, I can set scales="free" to apply to both x and y axis and use it with x=variable to arrive at a similar solution.

ggplot(mtcars_melt, aes(x=variable, y=value, fill=variable)) + geom_boxplot() + facet_wrap(~variable, scales="free")

enter image description here

Edited: The code below should work for your particular data set:

library(reshape2)
library(ggplot2)
vars <- c("cylinders", "displacement", "horsepower", "weight", "acceleration", "year", "origin")
Auto_melt <- melt(Auto[, vars])
ggplot(Auto_melt, aes(x="", y=value, fill=variable)) + geom_boxplot() + facet_wrap(~variable, scales="free_y")

Edited with code to separate by mpg as requested: Redefine vars by including "mpg01", and melt the data by mpg id. Use mpg01 as aes x value.

Auto <- structure(list(mpg01 = structure(c(2L, 1L, 1L, 1L, 1L), .Label = c("FALSE", "TRUE"), class = "factor"), cylinders = c(8L, 8L, 8L, 8L, 8L), displacement = c(307, 350, 318, 304, 302), horsepower = c(130L, 165L, 150L, 150L, 140L), weight = c(3504L, 3693L, 3436L, 3433L, 3449L), acceleration = c(12, 11.5, 11, 12, 10.5), year = c(70L, 70L, 70L, 70L, 70L), origin = c(1L, 1L, 1L, 1L, 1L)), .Names = c("mpg01", "cylinders", "displacement", "horsepower", "weight", "acceleration", "year", "origin"), row.names = c(NA, 5L), class = "data.frame") 

vars <- c("mpg01", "cylinders", "displacement", "horsepower", "weight", "acceleration", "year", "origin")
Auto_melt <- melt(Auto[, vars], id.vars="mpg01")
ggplot(Auto_melt, aes(x=mpg01, y=value, fill=variable)) + geom_boxplot() +    facet_wrap(~variable, scales="free_y")

enter image description here

Upvotes: 2

Vida Wang
Vida Wang

Reputation: 416

I think maybe you should tidy your data, then to draw boxplot. I download the data from the website :

> head(df)
  mpg01 cylinders displacement horsepower weight acceleration year origin
1    18         8          307        130   3504         12.0   70      1
2    15         8          350        165   3693         11.5   70      1
3    18         8          318        150   3436         11.0   70      1
4    16         8          304        150   3433         12.0   70      1
5    17         8          302        140   3449         10.5   70      1
6    15         8          429        198   4341         10.0   70      1

Use gather{tidyr} to tidy data.

library("tidyr")
library("dplyr")
library("ggplot2")
tidy_df <- df %>% gather("vars","values",-mpg01)

And tidy_df is:

> head(tidy_df)
  mpg01      vars values
1    18 cylinders      8
2    15 cylinders      8
3    18 cylinders      8
4    16 cylinders      8
5    17 cylinders      8
6    15 cylinders      8

Then you can draw boxplot

ggplot(data=tidy_df,aes(vars,values)) + geom_boxplot(aes(fill=vars))

It looks like that: enter image description here

Upvotes: 1

Related Questions