Reputation: 7986
I am trying to use use ggplot to plot production data by company and use the color of the point to designate year. The follwoing chart shows a example based on sample data:
However, often times my real data has 50-60 different comapnies wich makes the Company names on the Y axis to be tiglhtly grouped and not very asteticly pleaseing.
What is th easiest way to show data for only the top 5 companies information (ranked by 2011 quanties) and then show the rest aggregated and shown as "Other"?
Below is some sample data and the code I have used to create the sample chart:
# create some sample data
c=c("AAA","BBB","CCC","DDD","EEE","FFF","GGG","HHH","III","JJJ")
q=c(1,2,3,4,5,6,7,8,9,10)
y=c(2010)
df1=data.frame(Company=c, Quantity=q, Year=y)
q=c(3,4,7,8,5,14,7,13,2,1)
y=c(2011)
df2=data.frame(Company=c, Quantity=q, Year=y)
df=rbind(df1, df2)
# create plot
p=ggplot(data=df,aes(Quantity,Company))+
geom_point(aes(color=factor(Year)),size=4)
p
I started down the path of a brute force approach but thought there is probably a simple and elegent way to do this that I should learn. Any assistance would be greatly appreciated.
Upvotes: 4
Views: 15789
Reputation: 32859
See if this is what you want. It takes your df
dataframe, and some of the ideas already suggested by @cbeleites. The steps are:
1.Select 2011 data and order the companies from highest to lowest on Quantity.
2.Split df
into two bits: dftop
which contians the data for the top 5; and dfother
, which contains the aggregated data for the other companies (using ddply()
from the plyr package).
3.Put the two dataframes together to give dfnew
.
4.Set the order for which levels of Company are plotted: Top to bottom is highest to lowest, then "Other". The order is partly given by companies
, plus "Other".
5.Plot as before.
library(ggplot2)
library(plyr)
# Step 1
df2011 <- subset (df, Year == 2011)
companies <- df2011$Company [order (df2011$Quantity, decreasing = TRUE)]
# Step 2
dftop = subset(df, Company %in% companies [1:5])
dftop$Company = droplevels(dftop$Company)
dfother = ddply(subset(df, !(Company %in% companies [1:5])), .(Year), summarise, Quantity = sum(Quantity))
dfother$Company = "Other"
# Step 3
dfnew = rbind(dftop, dfother)
# Step 4
dfnew$Company = factor(dfnew$Company, levels = c("Other", rev(as.character(companies)[1:5])))
levels(dfnew$Company) # Check that the levels are in the correct order
# Step 5
p = ggplot (data = dfnew, aes (Quantity, Company)) +
geom_point (aes (color = factor (Year)), size = 4)
p
The code produces:
Upvotes: 3
Reputation: 14093
What about this:
df2011 <- subset (df, Year == 2011)
companies <- df2011$Company [order (df2011$Quantity, decreasing = TRUE)]
ggplot (data = subset (df, Company %in% companies [1 : 5]),
aes (Quantity, Company)) +
geom_point (aes (color = factor (Year)), size = 4)
BTW: in order for the code to be called elegant, spend a few more spaces, they aren't that expensive...
Upvotes: 6