Reputation: 317
I am using the boxplot function in R 3.1.1, and I am trying to understand what is happening behind the scenes rather than fix my code.
png(file = "plot1.png")
par(mfrow= c(1,2))
par(mar = c(3,4,4,1))
boxplot(emissions ~ year, col = "blue", xlab="Year", ylab ="Emissions", title = "Pm25 Emissions 1999 and 2008", bg ="white",ylim=c(0,6000))
boxplot(emissions2 ~ year2, col = "blue", xlab="Year", ylab ="Emissions", title = "Pm25 Emissions per Year", bg ="white",ylim=c(0,6000))
dev.off()
The resulting output is:
Under most situations from what I have read, the code should return a box and whiskers, but it is returning this linear mess of aligned dots that are no better than a bar chart. Any clues on what I have done wrong?
Thanks. The image is not posted as that I don't have 10 reputation points.
Full code to upload data set for automated and temporary processing.
url = "https://d396qusza40orc.cloudfront.net/exdata%2Fdata%2FNEI_data.zip"
#######Erased to encourage the learning process...
NEI <- readRDS(mydata[2])
SCC <- readRDS(mydata[1])
year <- (NEI[,6])
emissions <-( NEI[,4])
mat <- cbind(year,emissions)
png(file = "plot1.png")
....
Summary(NEI) results:
Emissions
Min : 0.0
1st Qu.: 0.0
Median : 0.0
Mean : 3.4
3rd Qu.: 0.1
Max. :646952.0
year
Min. :1999
1st Qu.:2002
Median :2005
Mean :2004
3rd Qu.:2008
Max. :2008
Upvotes: 1
Views: 177
Reputation: 5856
As you may have noticed, your NEI variable is strongly skewed.
library(dplyr)
nei <- as.tbl(NEI)
nei%>%
group_by(year) %>%
summarise(
min = min(Emissions),
max = max(Emissions),
mean = mean(Emissions),
median = median(Emissions),
Q25 = quantile (Emissions, probs=0.25),
Q75 = quantile (Emissions, probs=0.75)
)
the summary
Source: local data frame [4 x 7]
year min max mean median Q25 Q75
1 1999 0 66696.32 6.615401 0.040000000 0.0100000000 0.25600000
2 2002 0 646951.97 3.317747 0.007164684 0.0005436423 0.08000000
3 2005 0 58896.10 3.182719 0.006741885 0.0005283287 0.07000000
4 2008 0 20799.70 1.752560 0.005273130 0.0003983980 0.06162755
Upvotes: 1
Reputation: 121568
boxplot
is a representation of your data distribution. More preiscely it depends in your data quantiles values.
For example, if yours quantiles overlaps , you will have only one horizontal line( the box and whisker is flat) and your outliers as a vertical line of points.
You can easily imagine your data distibuted like this example:
set.seed(1)
boxplot(count ~ spray,
data = data.frame(count=c(rep(0,800),runif(200)),
spray=sample(1:2,1000,rep=TRUE)), col = "lightgray")
Upvotes: 0