Why is 'curve' so different from 'lines' and 'points' in R?

Question

I would like to fit the frequency data with discrete generalized beta distribution (DGBD).

The data look like this:

freq = c(1116, 2067, 137 ,  124, 643,  2042, 55  ,47186,  7504, 1488, 211,   1608,   
         3517 , 7  , 896  ,  378, 17 ,3098, 164977  ,  601 ,  196, 637, 149 , 44,2 ,  1801, 882   , 636,5184,  1851,  776 ,   343   , 851, 33  ,4011,   209,  715 , 
         937 , 20,   6922, 2028 , 23,  3045 , 16 , 334,  31 ,  2)

Rank = rank(-freq, ties.method = c("first") )
p = freq/sum(freq)

get the log forms

log.f = log(freq)
log.p = log(p)
log.rank = log(Rank)
log.inverse.rank = log(length(Rank)+1-Rank)

linear regression of the discrete generalized beta distribution

co=coef(lm(log.p~log.inverse.rank + log.rank))
zmf = function(x) exp(co[[1]]+ co[[2]]*log(length(x)+1-x) + co[[3]]*log(x))

plot

plot(p~Rank, xlim = c(1, 80), log = "xy",xlab = "Rank (log)", ylab = "Probability (log)")
curve(zmf, col="blue", add = T)
xx=c(1:length(Rank))
lines(zmf(xx)~xx, col = "red")
points(zmf(xx)~xx, col = "purple")

enter image description here

Figure 1. the plot looks like this

My question is what is the right way to demonstrate the result? lines (points) or curve?

Update:

Although I have not figured out the underling logic, the solution is found:

@Frank reminds me to notice the trick of setting the length of n in the curve. It solves the problem. Thus, n in curve is necessary when we try to fit the raw data. Although in many situations, n is ignored.

plot(p~Rank, log = "xy",xlab = "Rank (log)", ylab = "Probability (log)")
curve(zmf, col="blue", add = T, n = length(Rank)) # set the the number of x values at which to evaluate.

enter image description here

Why is 'curve' so different from 'lines' and 'points' in R?

get the log forms

linear regression of the discrete generalized beta distribution

plot

Figure 1. the plot looks like this

My question is what is the right way to demonstrate the result? lines (points) or curve?

Update:

Figure 2 The right way to use curve: specify the 'n'

Answers (1)

Related Questions

Why is &#39;curve&#39; so different from &#39;lines&#39; and &#39;points&#39; in R?

get the log forms

linear regression of the discrete generalized beta distribution

plot

Figure 1. the plot looks like this

My question is what is the right way to demonstrate the result? lines (points) or curve?

Update:

Figure 2 The right way to use curve: specify the 'n'

Answers (1)

Related Questions

Why is 'curve' so different from 'lines' and 'points' in R?

　Figure 2 The right way to use curve: specify the 'n'