Reputation: 545
I have a data frame with a column of continuous variables. I wanted to bin this data into another column so that I could produce a clearer plot. I did this like so:
#Add new column to data frame
mydf2["conDistanceBins"] <- NA
#Bin data from conDistance column of df into 5 bins in new column
mydf2$conDistanceBins <- as.numeric(cut2(mydf2$conDistance, g=5))
Having done this, I proceeded to attempt to plot. Now when I produce a single plot using ggplot2 with the following code my plot comes out correctly and coloured by the bins as I hoped:
p9 <- ggplot(mydf2, aes(x = x, y = y))
p9 + geom_point(aes(color=factor(mydf2$conDistanceBins)))
x and y are columns also within mydf2 data frame.
My issue occurs when I try to produce a facet grid like so:
p7 <- ggplot(mydf2, aes(x, y)) + geom_point(aes(color=factor(mydf2$conDistanceBins)))
p7 + facet_grid(Chromosome~., margins = TRUE)
Chromosome is another column from my data frame. However, when I attempt to run this code I get the following error:
Error: Aesthetics must be either length 1 or the same as the data (12390): colour, x, y
What I do not understand is why in one instance my code is working whilst in the other it is not, when in essence is the second bit of code not just taking the first but creating a facet grid broken up by the Chromosome column of my data frame?
Edit: Here is a portion of my data frame.
x y z Gene Chromosome Pos.start boot_avg boot_low
1 -0.2201704 2.2914659 -1.0503592 AGAP000002 X 582 46 5
2 -1.6164962 -0.4252216 4.1920188 AGAP000007 X 83817 25 0
3 0.1585863 -2.1869117 0.5772591 AGAP000010 X 120773 79 2
4 -1.5126431 -0.2293787 2.9891040 AGAP000011 X 127704 54 10
5 -1.5382538 -0.1100106 -0.1838767 AGAP000012 X 146181 84 64
boot_avglow branch_avg branch_low branch_avglow conDistance invDistance
1 9 0.01891250 0.001469 0.001865 4.472136 3.464102
2 0 0.01518050 0.000000 0.000000 6.403124 7.416198
3 39 0.02026960 0.001955 0.003372 3.741657 5.099020
4 10 0.01040867 0.003530 0.003735 6.244998 7.280110
5 67 0.01626420 0.000257 0.001936 4.123106 3.000000
Acceptable Bootstrap Cluster conDistanceBins invDistanceBins
1 Below threshold 1 3 1
2 Below threshold 2 5 5
3 Above threshold 3 2 2
4 Above threshold 2 5 5
5 Above threshold 4 2 1
Upvotes: 2
Views: 371
Reputation: 14360
It looks like your problem is coming from the mydf2$conDistanceBins
. If you just change mydf2$ConDistanceBins
to conDistanceBins
I don't get the error anymore. See the code and output below:
p7 <- ggplot(mydf2, aes(x, y)) + geom_point(aes(color=factor(conDistanceBins)))
p7 + facet_grid(Chromosome~., margins = TRUE)
Data: I only used the relevant pieces of your data:
mydf2<- structure(list(x = c(-0.2201704, -1.6164962, 0.1585863, -1.5126431,
-1.5382538), y = c(2.2914659, -0.4252216, -2.1869117, -0.2293787,
-0.1100106), z = c(-1.0503592, 4.1920188, 0.5772591, 2.989104,
-0.1838767), Gene = structure(1:5, .Label = c("AGAP000002", "AGAP000007",
"AGAP000010", "AGAP000011", "AGAP000012"), class = "factor"),
Chromosome = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "X", class = "factor"),
Pos.start = c(582L, 83817L, 120773L, 127704L, 146181L), boot_avg = c(46L,
25L, 79L, 54L, 84L), boot_low = c(5L, 0L, 2L, 10L, 64L),
boot_avglow = c(9L, 0L, 39L, 10L, 67L), branch_avg = c(0.0189125,
0.0151805, 0.0202696, 0.01040867, 0.0162642), branch_low = c(0.001469,
0, 0.001955, 0.00353, 0.000257), branch_avglow = c(0.001865,
0, 0.003372, 0.003735, 0.001936), conDistance = c(4.472136,
6.403124, 3.741657, 6.244998, 4.123106), invDistance = c(3.464102,
7.416198, 5.09902, 7.28011, 3), conDistanceBins = c(3, 5,
1, 4, 2)), row.names = c("1", "2", "3", "4", "5"), .Names = c("x",
"y", "z", "Gene", "Chromosome", "Pos.start", "boot_avg", "boot_low",
"boot_avglow", "branch_avg", "branch_low", "branch_avglow", "conDistance",
"invDistance", "conDistanceBins"), class = "data.frame")
Upvotes: 2