Alex
Alex

Reputation: 353

The plot and boxplot (using package ggplot) give the different results

I wrote the two codes and it seems they give different results.

I think the code given by plot() is correct.

Can anyone tell why the result given by boxplot() is different from plot() and it seems something wrong with the plot given by boxplot()

BTW, the dataset is from the package MASS in R, which is called Boston

install.packages('MASS')
library(MASS)
data_set1<-data(Boston,package='MASS')
attach(Boston)
par(mfrow=c(1,2))
boxplot(rad,crim,log='y')
plot(crim~as.factor(rad),log='y')

Best, Mati

Upvotes: 1

Views: 176

Answers (1)

MrFlick
MrFlick

Reputation: 206197

It's important to note that boxplot and plot are generic functions that behave differently based on what is passed to them. In this case, because you specify a factor as your x variable in the plot, it really comes down to comparing

boxplot(rad, crim, log='y')
boxplot(crim ~ as.factor(rad),log='y')

So you are either passing two different parmeters in the first case, or a formula in the second case. These behave very differently. If you don't use a formula, you just get a box plot for each variable you pass in. You can see what happens if you add other column names

boxplot(rad, crim, zn, dis, log='y')

There you can see that you just get a separate box plot for each of the variables you pass in. The "1" is the distribution of the rad variable for all observations, the "2" is the crim, and so on.

When you call

boxplot(crim ~ as.factor(rad),log='y')

You are getting a box plot for each unique value of rad. It's not really possible to add over variables when using the formula syntax.

See the ?boxplot help page for more details.

Also I should mention it's usually a bad idea to use attach(). It would be better to the data= parameter for functions that support it and with() for functions that do not. For example

with(Boston, boxplot(crim, rad, log="y"))
boxplot(crim~rad, log="y", data=Boston)

Upvotes: 2

Related Questions