dpuleo
dpuleo

Reputation: 323

Fill the area in specific points in a ggplot

I'm trying to create a plot where I can fill the area only in the points where my coefficients are significant using ggplot2.

I have created this example:

dt <- data.table(x = 0:23, y = c(0.00788665622373638, 0, 0, 0, 
                     0, 0, 0, 0, 0, 0, 0, 0, 0, 0.031263597681424, 0.0483478996438207, 
                     0.0339161353262161, 0, 0, 0, 0, 0, 0, 0, 0), value = c(0.335524374372203, 
                                                                            0.310445022036626, 0.00348268861151579, 0.000645923627809575, 
                                                                            0.0025476114971974, 0.000979901982654185, 0.00447235816030944, 
                                                                            0.000375791689380511, 0.00850170357523439, 0.185246478252772, 
                                                                            0.236061996429638, 0.611479957550591, 0.916055517054685, 0.047195113633542, 
                                                                            0.00170024647583689, 0.0138696238231373, 0.700687775315984, 0.0562079029293676, 
                                                                            0.00527934454203627, 0.00870851100765857, 0.005848832805464, 
                                                                            0.00300379176492194, 0.00400049813928849, 0.323674152828656))

And using the following code:

plt <- ggplot(dt,aes(x=x,y=y)) + geom_line(colour='blue') + geom_point() + geom_area(data=subset(dt,value<0.1 & y > 0),fill='skyblue',alpha=0.3)

I get this graph:

enter image description here

It seems that is connecting the points where value is under 0.1 and I only want to color the area under the line where value is under 0.1.

Is there any way around this?

Upvotes: 0

Views: 1526

Answers (1)

missuse
missuse

Reputation: 19716

I have been attempting to provide a function that will transform the data so it could be plotted as per request, and doing so I have found a potential problem in the idea.
Consider a point x where y is positive and value is < 0.1, while x-1 and x+1 have values > 0.1. With geom_area this point would be left out since the area of a line is 0. Hence I believe several other visualizations could be more beneficial:
geom_linerange or geom_pointrange are potentially better (and much easier to plot), here is an example with your data. It emphasizes the points where value < 0.1 and y > 0.

ggplot(dt,aes(x=x,y=y)) +
  geom_line(colour='blue') +
  geom_point() +
  geom_linerange(data = dt[dt$value < 0,1,], aes(ymin = 0, ymax = y), color= "skyblue", size = 1)

enter image description here

geom_point to emphasize the points where value < 0.1

ggplot(dt,aes(x=x,y=y)) +
  geom_line(colour='blue') +
  geom_point() +
  geom_point(data = dt[dt$value < 0.1,], color= "red", size = 2)  

enter image description here

If you are really set on using geom_area here is a function (only base R):

for_area = function(data, val){
  df = data
  v = ifelse(df$value >= val, 0, df$value)
  y = ifelse(df$value >= val, 0, df$y)
  df$value = v
  df$y = y

 pre = lapply(2:nrow(df), function(i){
    pre = ifelse(df$y[i-1] == 0 & df$y[i] !=0, i, 0)
    return(pre)
  })
  pro = lapply(1:nrow(df), function(i){
    pro = ifelse(df$y[i] != 0 & df$y[i+1] ==0, i, 0)
    return(pro)
  })      
  pre = do.call(rbind, pre)
  pro = do.call(rbind, pro)
  pre = pre[pre>0]
  pro = pro[pro>0]
  pre = df$x[pre]
  pro = df$x[pro]
  df$x1 = 1
    df = rbind(df, data.frame(x = pre,
                        y = rep(0, length(pre)),
                        value = rep(0, length(pre)),
                        x1 = rep(0, length(pre))))    

   df = rbind(df, data.frame(x = pro,
                             y = rep(0, length(pro)),
                             value = rep(0, length(pro)),
                             x1 = rep(2, length(pro))))
   df = df[with(df, order(x, x1)),]
   return(df)      
}

with the data in the op:

ggplot(dt,aes(x=x,y=y)) +
  geom_line(colour='blue') +
  geom_point() +
  geom_area(data = for_area(dt, 0.1), fill= "skyblue", alpha = 0.3)

enter image description here

with a more complicated example:

dput(daf)
structure(list(x = 1:25, y = c(0.3, 0.2, 0.2, 0, 0.1, 0.1, 0.3, 
0.2, 0.3, 0.1, 0, 0.3, 0.2, 0.1, 0.3, 0, 0.2, 0.3, 0, 0.1, 0.1, 
0.2, 0.3, 0, 0.3), value = c(0, 0.3, 0, 0, 0, 0.2, 0.3, 0.2, 
0.2, 0.3, 0.2, 0.2, 0, 0, 0.2, 0, 0.2, 0, 0.1, 0.1, 0.1, 0, 0.3, 
0.2, 0.3)), .Names = c("x", "y", "value"), row.names = c(NA, 
-25L), class = "data.frame")

enter image description here

This illustrates some of the problems I mentioned prior: value at x = 3 is 0.0, while y = 0.2 but there is no indicator of that since x = 4 and x = 2 have either value > 0.1 of y ==0

with geom_pointrage this would become:

enter image description here

Perhaps choosing the best from both worlds:

ggplot(daf,aes(x=x,y=y)) +
  geom_line(colour='blue') +
  geom_point() +
  geom_area(data = for_area(daf, 0.1), fill= "skyblue", alpha = 0.3 )+
  geom_linerange(data = daf[daf$value<0.1,], aes(ymin = 0, ymax = y), color= "skyblue", size = 1)

enter image description here

Upvotes: 1

Related Questions