Phil S.
Phil S.

Reputation: 300

geom_point points manual scaling

I got some data (named result.df) which looks like the following:

    orgaName                  abundance          pVal         score        
     A                        3          9.998622e-01     1.795338e-04
     B                        2          9.999790e-01     1.823428e-05
     C                        1          2.225074e-308    3.076527e+02
     D                        1          3.510957e-01     4.545745e-01

and so on...

What I am now plotting is this:

p1 <- ggplot(result.df, aes(log2(abundance), (1-pVal), label=orgaName)) +
   ylab("1 - P-Value")+
   xlab("log2(abundance)")+
   geom_point(aes(size=score))+
   ggtitle(colnames(case.count.matrix)[i])+
   geom_text(data=subset(result.df, pVal < 0.05),hjust=.65, vjust=-1.2,size=2.5)+       
   geom_hline(aes(yintercept=.95), colour="blue", linetype="dashed")+
   theme_classic()

Everything works fine and looks rather fine. However, what I would like is to scale the point size introduced through

geom_point(aes(size=score))+

to be scaled against fixed values. So the legend should scale in a decadic logarithm but the score should stay the same. Such that low scores nearly disappear and large scores are kind of comparable with respect to their point size between different "result.df".

EDIT

After checking on the comments of @roman and @vrajs5 I was able to produce a plot like this new plot. Using the following code:

   ggplot(result.df, aes(log2(abundance), (1-pVal), label=orgaName)) +
   ylab("1 - P-Value")+
   xlab("log2(abundance)")+
   geom_point(aes(size=score))+
   ggtitle(colnames(case.count.matrix)[i])+    
   #geom_text(data=subset(result.df, pVal < 0.05 & log2(abundance) > xInt),hjust=.65, vjust=-1.2,size=2.5)+
   geom_text(data=subset(result.df, pVal < 0.05),hjust=.65, vjust=-1.2,size=2.5)+
   geom_hline(aes(yintercept=.95), colour="blue", linetype="dashed")+
   #geom_vline(aes(xintercept=xInt), colour="blue", linetype="dashed")+
   #geom_text(data=subset(result.df, pVal > 0.05 & log2(abundance) > xInt),alpha=.5,hjust=.65, vjust=-1.2,size=2)+
   #geom_text(data=subset(result.df, pVal < 0.05 & log2(abundance) < xInt),alpha=.5,hjust=.65, vjust=-1.2,size=2)+
   theme_classic() + 
   scale_size(range=c(2,12),expand=c(2,0),breaks=c(0,1,10,100,1000,1000000),labels=c(">=0",">=1",">=10",">=100",">=1000",">=1000000"),guide="legend")

As you can see, the breaks are introduced and labeled as intendet. However the point size in the legend does not reflect the point sizes in the plot. Any idea how to fix this?

Upvotes: 7

Views: 27693

Answers (2)

Romu. P
Romu. P

Reputation: 31

I see the thread has not been completely solved, here is my contribution.

I faced the same problem, the answer has been partially given by Roman Luštrik.

Indeed, all you have to do is using scale_size() with both the scale and the limit arguments.

Using previous exemple, but adjusting it to your range of value :


result.df = read.table(text = 'orgaName                  abundance          pVal         score        
A                        3          9.998622e-01     1.795338e-04
B                        2          9.999790e-01     1.823428e-05
C                        1          2.225074e-308    3.076527e+02
D                        1          3.510957e-01     4.545745e-01
E                        3          2.510957e-01     2.545745e+00
F                        3          1.510957e-02     2.006527e+02
G                        2          5.510957e-01     3.545745e-02', header = T)

library(ggplot2)

ggplot(result.df, aes(log2(abundance), (1-pVal), label=orgaName)) +
  ylab("1 - P-Value")+
  xlab("log2(abundance)")+
  geom_point(aes(size=score))+
  #ggtitle(colnames(case.count.matrix)[i])+
  geom_text(data=subset(result.df, pVal < 0.05),hjust=.65, vjust=-1.2,size=2.5)+       
  geom_hline(aes(yintercept=.95), colour="blue", linetype="dashed")+
  theme_classic() + 
  scale_size(range = c(2,10), limits = c(1,1000000)

enter image description here

Of course, having such a wide scale (1 to 1000000) will not render very nicely as you can't have a size scale where the last circle is 1000000 bigger than the first one.

For this, I suggest to create a new column in the original table, and compute log10(score), then apply the same logic but with these values instead.

Upvotes: 0

vrajs5
vrajs5

Reputation: 4126

As @Roman mentioned, if you use scale_size you can specify the limits on size..

Following is the example how to control size of points

result.df = read.table(text = 'orgaName                  abundance          pVal         score        
A                        3          9.998622e-01     1.795338e-04
B                        2          9.999790e-01     1.823428e-05
C                        1          2.225074e-308    3.076527e+02
D                        1          3.510957e-01     4.545745e-01
E                        3          2.510957e-01     2.545745e+00
F                        3          1.510957e-02     2.006527e+02
G                        2          5.510957e-01     3.545745e-02', header = T)

library(ggplot2)
ggplot(result.df, aes(log2(abundance), (1-pVal), label=orgaName)) +
  ylab("1 - P-Value")+
  xlab("log2(abundance)")+
  geom_point(aes(size=score))+
  #ggtitle(colnames(case.count.matrix)[i])+
  geom_text(data=subset(result.df, pVal < 0.05),hjust=.65, vjust=-1.2,size=2.5)+       
  geom_hline(aes(yintercept=.95), colour="blue", linetype="dashed")+
  theme_classic() + 
  scale_size(range = c(2,12))

Output graph is enter image description here

Upvotes: 6

Related Questions