fugu
fugu

Reputation: 6578

Misplaced points in ggplot

I'm reading in a file like so:

genes<-read.table("goi.txt",header=TRUE, row.names=1)
control<-log2(1+(genes[,1]))
experiment<-log2(1+(genes[,2]))

And plotting them as a simple scatter in ggplot:

ggplot(genes, aes(control, experiment)) +
    xlim(0, 20) + 
    ylim(0, 20) +
    geom_text(aes(control, experiment, label=row.names(genes)),size=3)

However the points are incorrectly placed on my plot (see attached image)

This is my data:

          control     expt
gfi1     0.189634  3.16574
Ripply3 13.752000 34.40630
atonal   2.527670  4.97132
sox2    16.584300 42.73240
tbx15    0.878446  3.13560
hes8     0.830370  8.17272
Tlx1     1.349330  7.33417
pou4f1   3.763400  9.44845
pou3f2   0.444326  2.92796
neurog1 13.943800 24.83100
sox3    17.275700 26.49240
isl2     3.841100 10.08640

As you can see, 'Ripply3' is clearly in the wrong position on the graph!

Am I doing something really stupid?

enter image description here

Upvotes: 1

Views: 377

Answers (1)

joran
joran

Reputation: 173727

The aes() function used by ggplot looks first inside the data frame you provide via data = genes. This is why you can (and should) specify variable only by bare column names like control; ggplot will automatically know where to find the data.

But R's scoping system is such that if nothing by that name is found in the current environment, R will look in the parent environment, and so on, until it reaches the global environment until it finds something by that name.

So aes(control, experiment) looks for variables named control and experiment inside the data frame genes. It finds the original, untransformed control variable, but of course there is no experiment variable in genes. So it continues up the chain of environments until it hits the global environment, where you have defined the isolated variable experiment and uses that.

You meant to do something more like this:

genes$controlLog <- log2(1+(genes[,1]))
genese$exptLog <- log2(1+(genes[,2]))

followed by:

ggplot(genes, aes(controlLog, exptLog)) +
     xlim(0, 20) + 
     ylim(0, 20) +
     geom_text(aes(controlLog, exptLog, label=row.names(genes)),size=3)

Upvotes: 1

Related Questions