Reputation: 5831
I have the following data:
Splice.Pair proportion
1 AA-AG 0.010909091
2 AA-GC 0.003636364
3 AA-TG 0.003636364
4 AA-TT 0.007272727
5 AC-AC 0.003636364
6 AC-AG 0.003636364
7 AC-GA 0.003636364
8 AC-GG 0.003636364
9 AC-TC 0.003636364
10 AC-TG 0.003636364
11 AC-TT 0.003636364
12 AG-AA 0.010909091
13 AG-AC 0.007272727
14 AG-AG 0.003636364
15 AG-AT 0.003636364
16 AG-CC 0.003636364
17 AG-CT 0.007272727
... ... ...
I want to get a barchart visualising the proportion of each splice pair but only for splice pairs that have a proportion over, say, 0.004. I tried the following:
nc.subset <- subset(nc.dat, proportion > 0.004)
qplot(Splice.Pair, proportion, data=nc.dat.subset,geom="bar", xlab="Splice Pair", ylab="Proportion of total non-canonical splice sites") + coord_flip();
But this just gives me a bar chart with all splice pairs on the Y-axis, except that the splice pairs that were filtered out are missing bars.
I have no idea what is happening to allow all categories to still be present :s
Upvotes: 5
Views: 4184
Reputation: 173577
What's happening is that Splice.Pair is a factor. When you subset your data frame, the factor retains it's levels attribute, which still has all of the original levels. You can avoid this kind of problem by simply wrapping your subsetting in droplevels
:
nc.subset <- droplevels(subset(nc.dat, proportion > 0.004))
More generally, if you dislike this kind of automatic retention of levels with factors, you can set R to store strings as character vectors rather than factors by default by setting:
options(stringsAsFactors = FALSE)
at the beginning of your R session (this can also be passed as an option to data.frame
as well).
EDIT
Regarding the issue of running older versions of R that may lack droplevels
, @rcs points out in a comment that the method for a single factor is very simple to implement on your own. The method for data frames is only slightly more complicated:
function (x, except = NULL, ...)
{
ix <- vapply(x, is.factor, NA)
if (!is.null(except))
ix[except] <- FALSE
x[ix] <- lapply(x[ix], factor)
x
}
But of course, the best solution is still to upgrade to the latest version of R.
Upvotes: 6
Reputation: 1595
Check whether Splice.Pair is a factor. If that's the case, use droplevels()
to remove the levels that are no longer used to resolve your problem.
nc.subset <- subset(nc.dat, proportion > 0.004)
nc.subset$Splice.Pair <- droplevels(nc.subset$Splice.Pair)
qplot(Splice.Pair, proportion, data=nc.dat.subset,geom="bar", xlab="Splice Pair", ylab="Proportion of total non-canonical splice sites") + coord_flip();
You may be able to incorporate droplevels
into qlot
, but that's for you to find you :-)
Upvotes: 1