Reputation: 6578
I'm reading in a file like so:
genes<-read.table("goi.txt",header=TRUE, row.names=1)
control<-log2(1+(genes[,1]))
experiment<-log2(1+(genes[,2]))
And plotting them as a simple scatter in ggplot
:
ggplot(genes, aes(control, experiment)) +
xlim(0, 20) +
ylim(0, 20) +
geom_text(aes(control, experiment, label=row.names(genes)),size=3)
However the points are incorrectly placed on my plot (see attached image)
This is my data:
control expt
gfi1 0.189634 3.16574
Ripply3 13.752000 34.40630
atonal 2.527670 4.97132
sox2 16.584300 42.73240
tbx15 0.878446 3.13560
hes8 0.830370 8.17272
Tlx1 1.349330 7.33417
pou4f1 3.763400 9.44845
pou3f2 0.444326 2.92796
neurog1 13.943800 24.83100
sox3 17.275700 26.49240
isl2 3.841100 10.08640
As you can see, 'Ripply3' is clearly in the wrong position on the graph!
Am I doing something really stupid?
Upvotes: 1
Views: 377
Reputation: 173727
The aes()
function used by ggplot
looks first inside the data frame you provide via data = genes
. This is why you can (and should) specify variable only by bare column names like control
; ggplot
will automatically know where to find the data.
But R's scoping system is such that if nothing by that name is found in the current environment, R will look in the parent environment, and so on, until it reaches the global environment until it finds something by that name.
So aes(control, experiment)
looks for variables named control
and experiment
inside the data frame genes
. It finds the original, untransformed control
variable, but of course there is no experiment
variable in genes
. So it continues up the chain of environments until it hits the global environment, where you have defined the isolated variable experiment
and uses that.
You meant to do something more like this:
genes$controlLog <- log2(1+(genes[,1]))
genese$exptLog <- log2(1+(genes[,2]))
followed by:
ggplot(genes, aes(controlLog, exptLog)) +
xlim(0, 20) +
ylim(0, 20) +
geom_text(aes(controlLog, exptLog, label=row.names(genes)),size=3)
Upvotes: 1