Reputation: 23
Okay, so I'm really stuck. I have a data set which looks like this:
Species Latitude Longitude Oiling Condition BirdCount Date_ Oil_Cond Date week.number
1 Northern Gannet 30.32860 -89.19810 Not Visibly Oiled Live 1 2010-07-21 1 2010-07-21 30
2 Laughing Gull 30.23172 -88.32127 Not Visibly Oiled Live 1 2010-05-05 1 2010-05-05 19
3 Northern Gannet 30.26677 -87.59248 Visibly Oiled Live 1 2010-05-05 2 2010-05-05 19
4 American White Pelican 29.29649 -89.66432 Not Visibly Oiled Live 1 2010-05-05 1 2010-05-05 19
5 Brown Pelican 29.88244 -88.87624 Visibly Oiled Live 1 2010-05-08 2 2010-05-08 19
6 Brown Pelican 29.00290 -89.36961 Not Visibly Oiled Live 1 2010-05-14 1 2010-05-14 20
7 Northern Gannet 30.33390 -85.56565 Unknown Live 1 2010-05-17 6 2010-05-17 21
8 Common Loon 30.28177 -87.51028 Not Visibly Oiled Live 1 2010-05-17 1 2010-05-17 21
9 Brown Pelican 30.41410 -88.24542 Visibly Oiled Live 1 2010-05-18 2 2010-05-18 21
10 Northern Gannet 30.24063 -88.12451 Not Visibly Oiled Live 1 2010-05-18 1 2010-05-18 21
And I'm trying to get a faceted histogram plotting the variable Oil_Cond for the 5 most frequent species of birds (there are over 100 unique bird species).
At first I wanted to produce a facet with all the species and used the following code:
qplot(Oil_Cond, data = birds, facets = Species ~., geom = "histogram")
But of course, that overloaded and wouldn't work because there would have been over 100 facets. So then I decided that I really only care about the top 5 species anyways, and I worked out what they are and with what frequency they appear (Laughing Gull: 3036, Brown Pelican: 789, Northern Gannet: 546, Royal Tern: 321, Black Skimmer: 258). However, I am at a loss as to how to do that.
Any help would be much appreciated.
Thank you :)
Amy
Upvotes: 2
Views: 190
Reputation: 33
You could tackle this using the excellent plyr package...
# If you don't already have plyr installed, uncomment the next line:
# install.packages('plyr')
require(plyr)
# First, find out how many of each species you have...
ns=ddply(birds,.(Species),summarise,n=length(Species))
# This will produce a table listing the number of each species you have
# (in the column 'n'). Type 'ns' to see the table.
# We can then rank the species occurrence, to see how important the different
# species are
ns$r = rank(-ns$n) # negative because 'rank' starts with the lowest number.
# have a look at the top 5 species:
subset(ns,r<=5)
# There are a couple of ways to proceed from here. Either we could get the
# top 5 species names from this 'ns' table:
# names=as.character(subset(ns,r>=5)$Species)
# and use joran's method, or we could merge the ns table and the original
# dataset (so that each species has an 'n' and 'r' attribute) and subset the
# data by species number or rank. I prefer the latter, as it allows you to
# flexibly change the species number threshold. i.e.:
birds=merge(birds,ns,by='Species')
# We've now added 'n' and 'r' columns to the birds data, so we can select
# our subset based on either of these columns:
birds.by.r=subset(birds,r<=5) # selects only the top 5 bird species
birds.by.n=subset(birds,r>=100) # selects all species with over 100 occurrences
# Then just plot away!
qplot(Oil_Cond,data=birds.by.r,facets=Species~.,geom='histogram')
# or
qplot(Oil_Cond,data=birds.by.n,facets=Species~.,geom='histogram')
Upvotes: 1
Reputation: 173697
The easiest thing to do here may be to simply plot a subset of your data. The only potential thing to be careful of is if the species variable is stored as a factor, rather than as strings. First create a subset:
birdsSub <- subset(birds, Species %in% c('Laughing Gull','Brown Pelican',
'Northern Gannet','Royal Tern','Black Skimmer'))
birdsSub$Species <- droplevels(birdsSub$Species)
and then you should be able to pass this data frame to qplot
as you have before. The reason for the droplevels
is that if that variable is stored as a factor, all the species that no longer appear will 'come along for the ride' as unused factor levels, and you'll just end up with all 100 panels, all but five them being empty.
Upvotes: 3