Aaron
Aaron

Reputation: 317

R Boxplot Graph--odd result

I am using the boxplot function in R 3.1.1, and I am trying to understand what is happening behind the scenes rather than fix my code.

png(file = "plot1.png")
par(mfrow= c(1,2))
par(mar = c(3,4,4,1))
boxplot(emissions ~ year, col = "blue", xlab="Year", ylab ="Emissions", title = "Pm25 Emissions 1999 and 2008", bg ="white",ylim=c(0,6000))
boxplot(emissions2 ~ year2, col = "blue", xlab="Year", ylab ="Emissions", title = "Pm25 Emissions per Year", bg ="white",ylim=c(0,6000))
dev.off()

The resulting output is:

enter image description here

Under most situations from what I have read, the code should return a box and whiskers, but it is returning this linear mess of aligned dots that are no better than a bar chart. Any clues on what I have done wrong?

Thanks. The image is not posted as that I don't have 10 reputation points.

Full code to upload data set for automated and temporary processing.

url = "https://d396qusza40orc.cloudfront.net/exdata%2Fdata%2FNEI_data.zip"
#######Erased to encourage the learning process...
NEI <- readRDS(mydata[2])
SCC <- readRDS(mydata[1])
year <- (NEI[,6])
emissions <-( NEI[,4])
mat <- cbind(year,emissions)
png(file = "plot1.png")
....

Summary(NEI) results:

Emissions
Min : 0.0
1st Qu.: 0.0
Median : 0.0
Mean : 3.4
3rd Qu.: 0.1
Max. :646952.0

       year     
Min.   :1999  

1st Qu.:2002
Median :2005
Mean :2004
3rd Qu.:2008
Max. :2008

Upvotes: 1

Views: 177

Answers (2)

Paulo E. Cardoso
Paulo E. Cardoso

Reputation: 5856

As you may have noticed, your NEI variable is strongly skewed.

library(dplyr)
nei <- as.tbl(NEI)
nei%>%
  group_by(year) %>%
  summarise(
    min = min(Emissions),
    max = max(Emissions),
    mean = mean(Emissions),
    median = median(Emissions),
    Q25 = quantile (Emissions, probs=0.25),
    Q75 = quantile (Emissions, probs=0.75)
    )

the summary

Source: local data frame [4 x 7]

  year min       max     mean      median          Q25        Q75
1 1999   0  66696.32 6.615401 0.040000000 0.0100000000 0.25600000
2 2002   0 646951.97 3.317747 0.007164684 0.0005436423 0.08000000
3 2005   0  58896.10 3.182719 0.006741885 0.0005283287 0.07000000
4 2008   0  20799.70 1.752560 0.005273130 0.0003983980 0.06162755

Upvotes: 1

agstudy
agstudy

Reputation: 121568

boxplot is a representation of your data distribution. More preiscely it depends in your data quantiles values.

For example, if yours quantiles overlaps , you will have only one horizontal line( the box and whisker is flat) and your outliers as a vertical line of points.

You can easily imagine your data distibuted like this example:

set.seed(1)
boxplot(count ~ spray, 
        data = data.frame(count=c(rep(0,800),runif(200)),
                          spray=sample(1:2,1000,rep=TRUE)), col = "lightgray")

enter image description here

Upvotes: 0

Related Questions