Reputation: 153
I have a graph and I calculated the degree distribution and degree as follows:
library(igraph) # for these two functions
dd <- degree_distribution(graph)
d <- degree(graph)
From this, I estimated to Power Law, to see if my distribution follows the "Law of Power":
degree = 1:max(d)
probability = dd[-1]
nonzero.position = which(probability != 0)
probability = probability[nonzero.position]
degree = degree[nonzero.position]
reg = lm(log(probability) ~ log(degree))
cozf = coef(reg)
power.law.fit = function(x) exp(cozf[[1]] + cozf[[2]] * log(x))
From that, I plotted the points and power law using ggplot2
.
Resulting in the following image:
df <- data.frame(x = degree, y = probability)
print(
ggplot(df, aes(x,y,colour="Distribuição"))+
geom_point(shape = 4) +
stat_function(fun = power.law.fit, geom = "line", aes(colour="Power Law"))+
labs(title = "Grafo", subtitle = "Distribuição dos Graus",
x="K", y="P(k)", colour="Legenda")+
scale_color_brewer(palette="Dark2")
)
As you can see, my distribution does not follow Power Law! I would like to estimate the Poisson distribution and plot on the same graph. Even though I'm not sure that my distribution does not follow (or follow) Poisson, I would like to draw it together with Power Law. I have no idea how to estimate this distribution (Poisson) from the data, and calculate the average degree.
Can anyone help me?
The graph used to calculate the distribution and the degree is very large (700 thousand vertices), so I did not put the data of the graphs. The explanation of the answer can be based on any graph.
Upvotes: 5
Views: 1861
Reputation: 8295
From ?dpois
:
The Poisson distribution has density
p(x) = λ^x exp(-λ)/x!
for x = 0, 1, 2, … . The mean and variance are E(X) = Var(X) = λ.
So I'll generate some dummy data with a secret lambda:
mysecret <- ####
x <- data.frame(xes = rpois(50, mysecret))
> x$xes [1] 0 2 2 1 1 4 1 1 0 2 2 2 1 0 0 1 2 3 2 4 2 1 0 3 2 1 3 1 2 1 5 0 2 3 2 1 0 1 2 3 0 1 2 2 0 3 2 2 2 3 > mean(x$xes) [1] 1.66 > var(x$xes) [1] 1.371837
So two good guesses for my secret lambda are 1.66 and 1.37. Let's try them:
library(ggplot2)
ggplot(x, aes(xes)) +
geom_histogram(aes(y = ..density.., color = "Raw data"),
fill = "white", binwidth = 1, center = 0, size = 1.5) +
stat_summary(fun.y = dpois, aes(x = xes, y = xes, color = "Density based on E(X)"),
fun.args = list(lambda = 1.66), geom = "line", size = 1.5) +
stat_summary(fun.y = dpois, aes(x = xes, y = xes, color = "Density based on Var(X)"),
fun.args = list(lambda = 1.37), geom = "line", size = 1.5)
They're both pretty good. You can't really use the built-in stat_function
or geom_density
for generating these, since Poisson distributions are only defined for integers. The histogram and summary functions work well though, since they're only estimated at the data points themselves, not interpolated.
If you want more detail, you can use the MASS
package:
MASS::fitdistr(x$xes, dpois, start = list(lambda = 1))
lambda 1.6601563 (0.1822258)
So let's try constructing from that:
library(dplyr)
df <- data_frame(xes = seq.int(max(x$xes)+1)-1,
dens.m = dpois(xes, 1.66),
dens.u = dpois(xes, 1.66+0.18),
dens.l = dpois(xes, 1.66-0.18))
> df # A tibble: 6 x 4 xes dens.m dens.u dens.l <dbl> <dbl> <dbl> <dbl> 1 0 0.19013898 0.15881743 0.22763769 2 1 0.31563071 0.29222406 0.33690378 3 2 0.26197349 0.26884614 0.24930880 4 3 0.14495866 0.16489230 0.12299234 5 4 0.06015785 0.07585046 0.04550717 6 5 0.01997240 0.02791297 0.01347012
ggplot(x, aes(xes)) +
geom_histogram(aes(y = ..density..), color = "black",
fill = "white", binwidth = 1, center = 0, size = 1.5) +
geom_ribbon(data = df, aes(xes, ymin = dens.l, ymax = dens.u), fill = "grey50", alpha = 0.5) +
geom_line(data = df, aes(xes, dens.m, color = "Based on E(X)\n+/-1 SD of lambda"), size = 1.5)
Based on these two methods and visual interpretation, you should feel comfortable saying λ = 1.66+/-0.18.
For reference, my secret initial value was 1.5.
Upvotes: 3