Reputation: 353
I wrote the two codes and it seems they give different results.
I think the code given by plot() is correct.
Can anyone tell why the result given by boxplot() is different from plot() and it seems something wrong with the plot given by boxplot()
BTW, the dataset is from the package MASS in R, which is called Boston
install.packages('MASS')
library(MASS)
data_set1<-data(Boston,package='MASS')
attach(Boston)
par(mfrow=c(1,2))
boxplot(rad,crim,log='y')
plot(crim~as.factor(rad),log='y')
Best, Mati
Upvotes: 1
Views: 176
Reputation: 206197
It's important to note that boxplot
and plot
are generic functions that behave differently based on what is passed to them. In this case, because you specify a factor as your x
variable in the plot, it really comes down to comparing
boxplot(rad, crim, log='y')
boxplot(crim ~ as.factor(rad),log='y')
So you are either passing two different parmeters in the first case, or a formula in the second case. These behave very differently. If you don't use a formula, you just get a box plot for each variable you pass in. You can see what happens if you add other column names
boxplot(rad, crim, zn, dis, log='y')
There you can see that you just get a separate box plot for each of the variables you pass in. The "1" is the distribution of the rad
variable for all observations, the "2" is the crim
, and so on.
When you call
boxplot(crim ~ as.factor(rad),log='y')
You are getting a box plot for each unique value of rad
. It's not really possible to add over variables when using the formula syntax.
See the ?boxplot
help page for more details.
Also I should mention it's usually a bad idea to use attach()
. It would be better to the data=
parameter for functions that support it and with()
for functions that do not. For example
with(Boston, boxplot(crim, rad, log="y"))
boxplot(crim~rad, log="y", data=Boston)
Upvotes: 2