Reputation: 60756
I have been struggling with how to make a Pareto Chart in R using the ggplot2 package. In many cases when making a bar chart or histogram we want items sorted by the X axis. In a Pareto Chart we want the items ordered descending by the value in the Y axis. Is there a way to get ggplot to plot items ordered by the value in the Y axis? I tried sorting the data frame first but it seems ggplot reorders them.
Example:
val <- read.csv("http://www.cerebralmastication.com/wp-content/uploads/2009/11/val.txt")
val<-with(val, val[order(-Value), ])
p <- ggplot(val)
p + geom_bar(aes(State, Value, fill=variable), stat = "identity", position="dodge") + scale_fill_brewer(palette = "Set1")
the data frame val is sorted but the output looks like this:
(source: cerebralmastication.com)
Hadley correctly pointed out that this produces a much better graphic for showing actuals vs. predicted:
ggplot(val, aes(State, Value)) + geom_bar(stat = "identity", subset = .(variable == "estimate"), fill = "grey70") + geom_crossbar(aes(ymin = Value, ymax = Value), subset = .(variable == "actual"))
which returns:
(source: cerebralmastication.com)
But it's still not a Pareto Chart. Any tips?
Upvotes: 20
Views: 27134
Reputation: 1599
We can use the ggQC
package.
library(ggplot2)
library(ggQC)
Data4Pareto <- data.frame(
KPI = c("Customer Service Time", "Order Fulfillment", "Order Processing Time",
"Order Production Time", "Order Quality Control Time", "Rework Time",
"Shipping"),
Time = c(1.50, 38.50, 3.75, 23.08, 1.92, 3.58, 73.17))
ggplot2::ggplot(Data4Pareto, aes(x = KPI, y = Time)) +
ggQC::stat_pareto(point.color = "red",
point.size = 3,
line.color = "black",
bars.fill = c("blue", "orange")) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust=0.5))
Upvotes: 5
Reputation: 368439
Subsetting and sorting your data;
valact <- subset(val, variable=='actual')
valsort <- valact[ order(-valact[,"Value"]),]
From there it's just a standard boxplot()
with a very manual cumulative function on top:
op <- par(mar=c(3,3,3,3))
bp <- barplot(valsort [ , "Value"], ylab="", xlab="", ylim=c(0,1),
names.arg=as.character(valsort[,"State"]), main="How's that?")
lines(bp, cumsum(valsort[,"Value"])/sum(valsort[,"Value"]),
ylim=c(0,1.05), col='red')
axis(4)
box()
par(op)
which should look like this
(source: eddelbuettel.com)
and it doesn't even need the overplotting trick as lines()
happily annotates the initial plot.
Upvotes: 23
Reputation: 7905
freqplot = function(x, by = NULL, right = FALSE)
{
if(is.null(by)) stop('Valor de "by" precisa ser especificado.')
breaks = seq(min(x), max(x), by = by )
ecd = ecdf(x)
den = ecd(breaks)
table = table(cut(x, breaks = breaks, right = right))
table = table/sum(table)
intervs = factor(names(table), levels = names(table))
freq = as.numeric(table/sum(table))
acum = as.numeric(cumsum(table))
normalize.vec = function(x){
(x - min(x))/(max(x) - min(x))
}
dados = data.frame(classe = intervs, freq = freq, acum = acum, acum_norm = normalize.vec(acum))
p = ggplot(dados) +
geom_bar(aes(classe, freq, fill = classe), stat = 'identity') +
geom_point(aes(classe, acum_norm, group = '1'), shape = I(1), size = I(3), colour = 'gray20') +
geom_line(aes(classe, acum_norm, group = '1'), colour = I('gray20'))
p
}
Upvotes: 0
Reputation: 2155
A traditional Pareto chart in ggplot2.......
Developed after reading Cano, E. L., Moguerza, J. M., & Redchuk, A. (2012). Six Sigma with R. (G. Robert, K. Hornik, & G. Parmigiani, Eds.) Springer.
library(ggplot2);library(grid)
counts <- c(80, 27, 66, 94, 33)
defects <- c("price code", "schedule date", "supplier code", "contact num.", "part num.")
dat <- data.frame(count = counts, defect = defects, stringsAsFactors=FALSE )
dat <- dat[order(dat$count, decreasing=TRUE),]
dat$defect <- factor(dat$defect, levels=dat$defect)
dat$cum <- cumsum(dat$count)
count.sum<-sum(dat$count)
dat$cum_perc<-100*dat$cum/count.sum
p1<-ggplot(dat, aes(x=defect, y=cum_perc, group=1))
p1<-p1 + geom_point(aes(colour=defect), size=4) + geom_path()
p1<-p1+ ggtitle('Pareto Chart')+ theme(axis.ticks.x = element_blank(), axis.title.x = element_blank(),axis.text.x = element_blank())
p1<-p1+theme(legend.position="none")
p2<-ggplot(dat, aes(x=defect, y=count,colour=defect, fill=defect))
p2<- p2 + geom_bar()
p2<-p2+theme(legend.position="none")
plot.new()
grid.newpage()
pushViewport(viewport(layout = grid.layout(2, 1)))
print(p1, vp = viewport(layout.pos.row = 1,layout.pos.col = 1))
print(p2, vp = viewport(layout.pos.row = 2,layout.pos.col = 1))
Upvotes: 7
Reputation: 121127
To simplify things, let's just consider only the estimates.
estimates <- subset(val, variable == "estimate")
First we reorder the factor levels, so that State
s are plotted in decreasing order of Value
.
estimates$State <- with(estimates, reorder(State, -Value))
Similarly, we reorder the dataset and calculate a cumulative value.
estimates <- estimates[order(estimates$Value, decreasing = TRUE),]
estimates$cumulative <- cumsum(estimates$Value)
Now we are ready to draw the plot. The trick to get a line and bar on the same axes is to convert the State variable (a factor) to be numeric.
p <- ggplot(estimates, aes(State, Value)) +
geom_bar() +
geom_line(aes(as.numeric(State), cumulative))
p
As mentioned in the question, trying to draw two Pareto plots of two variable groups right next to each other isn't very easy. You'd probably be better off using facetting if you want multiple Pareto plots.
Upvotes: 1
Reputation: 3708
With a simple example:
> data
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
0.29056 0.23833 0.11003 0.05549 0.04678 0.03788 0.02770 0.02323 0.02211 0.01925
barplot(data)
does things correctly
the ggplot equivalent "should be": qplot(x=names(data), y=data, geom='bar')
But that incorrectly reorders/sorts the bars alphabetically... because that's how levels(factor(names(data)))
would be ordered.
Solution: qplot(x=factor(names(data), levels=names(data)), y=data, geom='bar')
Phew!
Upvotes: 4
Reputation: 25367
The bars in ggplot2 are ordered by the ordering of the levels in the factor.
val$State <- with(val, factor(val$State, levels=val[order(-Value), ]$State))
Upvotes: 16
Reputation: 368439
Also, see the package qcc which has a function pareto.chart()
. Looks like it uses base graphics too, so start your bounty for a ggplot2-solution :-)
Upvotes: 3