Fillipe
Fillipe

Reputation: 153

Estimating the Poisson distribution

I have a graph and I calculated the degree distribution and degree as follows:

library(igraph) # for these two functions

dd <- degree_distribution(graph) 
d <- degree(graph)

From this, I estimated to Power Law, to see if my distribution follows the "Law of Power":

degree = 1:max(d)
probability = dd[-1]

nonzero.position = which(probability != 0)
probability = probability[nonzero.position]
degree = degree[nonzero.position]

reg = lm(log(probability) ~ log(degree))
cozf = coef(reg)

power.law.fit = function(x) exp(cozf[[1]] + cozf[[2]] * log(x))

From that, I plotted the points and power law using ggplot2. Resulting in the following image:

df <- data.frame(x = degree, y = probability)
  print(
      ggplot(df, aes(x,y,colour="Distribuição"))+
        geom_point(shape = 4) +
        stat_function(fun = power.law.fit, geom = "line", aes(colour="Power Law"))+

        labs(title = "Grafo", subtitle = "Distribuição dos Graus",
             x="K", y="P(k)", colour="Legenda")+
        scale_color_brewer(palette="Dark2")
  )

PowerLaw and Data

As you can see, my distribution does not follow Power Law! I would like to estimate the Poisson distribution and plot on the same graph. Even though I'm not sure that my distribution does not follow (or follow) Poisson, I would like to draw it together with Power Law. I have no idea how to estimate this distribution (Poisson) from the data, and calculate the average degree.

Can anyone help me?

The graph used to calculate the distribution and the degree is very large (700 thousand vertices), so I did not put the data of the graphs. The explanation of the answer can be based on any graph.

Upvotes: 5

Views: 1861

Answers (1)

Brian
Brian

Reputation: 8295

From ?dpois:

The Poisson distribution has density

p(x) = λ^x exp(-λ)/x!

for x = 0, 1, 2, … . The mean and variance are E(X) = Var(X) = λ.

So I'll generate some dummy data with a secret lambda:

mysecret <- ####

x <- data.frame(xes = rpois(50, mysecret))
> x$xes
 [1] 0 2 2 1 1 4 1 1 0 2 2 2 1 0 0 1 2 3 2 4 2 1 0 3 2 1 3 1 2 1 5 0 2 3 2 1 0 1 2 3 0 1 2 2 0 3 2 2 2 3


> mean(x$xes)
[1] 1.66
> var(x$xes)
[1] 1.371837

So two good guesses for my secret lambda are 1.66 and 1.37. Let's try them:

library(ggplot2)
ggplot(x, aes(xes)) + 
  geom_histogram(aes(y = ..density.., color = "Raw data"), 
                 fill = "white", binwidth = 1, center = 0, size = 1.5) +
  stat_summary(fun.y = dpois, aes(x = xes, y = xes, color = "Density based on E(X)"), 
               fun.args = list(lambda = 1.66), geom = "line", size = 1.5) +
  stat_summary(fun.y = dpois, aes(x = xes, y = xes, color = "Density based on Var(X)"), 
               fun.args = list(lambda = 1.37), geom = "line", size = 1.5)

enter image description here

They're both pretty good. You can't really use the built-in stat_function or geom_density for generating these, since Poisson distributions are only defined for integers. The histogram and summary functions work well though, since they're only estimated at the data points themselves, not interpolated.

If you want more detail, you can use the MASS package:

MASS::fitdistr(x$xes, dpois, start = list(lambda = 1))
    lambda  
  1.6601563 
 (0.1822258)

So let's try constructing from that:

library(dplyr)
df <- data_frame(xes = seq.int(max(x$xes)+1)-1,
                 dens.m = dpois(xes, 1.66),
                 dens.u = dpois(xes, 1.66+0.18),
                 dens.l = dpois(xes, 1.66-0.18))
> df
# A tibble: 6 x 4
    xes     dens.m     dens.u     dens.l
  <dbl>      <dbl>      <dbl>      <dbl>
1     0 0.19013898 0.15881743 0.22763769
2     1 0.31563071 0.29222406 0.33690378
3     2 0.26197349 0.26884614 0.24930880
4     3 0.14495866 0.16489230 0.12299234
5     4 0.06015785 0.07585046 0.04550717
6     5 0.01997240 0.02791297 0.01347012
ggplot(x, aes(xes)) + 
  geom_histogram(aes(y = ..density..), color = "black",
                 fill = "white", binwidth = 1, center = 0, size = 1.5) +
  geom_ribbon(data = df, aes(xes, ymin = dens.l, ymax = dens.u), fill = "grey50", alpha = 0.5) +
  geom_line(data = df, aes(xes, dens.m, color = "Based on E(X)\n+/-1 SD of lambda"), size = 1.5)

enter image description here

Based on these two methods and visual interpretation, you should feel comfortable saying λ = 1.66+/-0.18.

For reference, my secret initial value was 1.5.

Upvotes: 3

Related Questions