FGiorlando
FGiorlando

Reputation: 1131

ggplot2 geom_smooth fails with error with scale_y_probit

I am trying to plot lines of best fit to a cumulative density I am representing the results using a reciprobit plot (log latency vs probit cumulative probability).

grp<-c("g1","g1","g1","g1","g2","g2","g2","g2","g3","g3","g3","g3")
lat<-c(1, 4, 6, 8, 2, 3, 7, 9, 1, 4, 8, 8)

data<-data.frame(grp,lat)

d.f <- arrange(data,grp,lat)  # sort data into ascending values
d.f.ecdf <- ddply(d.f, .(grp), transform, ecdf=ecdf(lat)(lat) )  #
calculate ecdf

p <- ggplot( d.f.ecdf, aes(lat, ecdf, colour = grp) )

p+geom_point()+
scale_x_log10()+
scale_y_probit()

All ok up to this point but if I add

p+scale_y_probit()+geom_smooth()

OR

p+scale_y_probit()+stat_smooth()

i get the error: Error: NA/NaN/Inf in foreign function call (arg 1)

It works with most other distributions, for example

p+geom_point()+
scale_x_log10()+
scale_y_inverse()+
geom_smooth()

Is there any way around this issue?

Upvotes: 1

Views: 1211

Answers (1)

joran
joran

Reputation: 173677

You compute the ECDF for each group, which results in several values exactly equal to 1. The probit function evaluated at 1 is infinite. (Probit(1) should give you the value of a standard normal random variable with all of the other values to the left, i.e. the area to the left of this value should be 1. So, infinite.)

And the scatterplot smoothing methods (and most other fitting methods as well) won't play nicely with infinite response values.

After building the data frame, you can change all the value in ecdf that are 1 to something just slightly less than one and your code will run without errors.

Upvotes: 1

Related Questions