Michael
Michael

Reputation: 11

Using python and networkx to find the probability density function

I'm struggling to draw a power law graph for Facebook Data that I found online. I'm using Networkx and I've found how to draw a Degree Histogram and a degree rank. The problem that I'm having is I want the y axis to be a probability so I'm assuming I need to sum up each y value and divide by the total number of nodes? Can anyone please help me do this? Once I've got this I'd like to draw a log-log graph to see if I can obtain a straight line. I'd really appreciate it if anyone could help! Here's my code:

import collections 
import networkx as nx
import matplotlib.pyplot as plt
from networkx.algorithms import community
import math
import pylab as plt

g = nx.read_edgelist("/Users/Michael/Desktop/anaconda3/facebook_combined.txt","r")
nx.info(g)

degree_sequence = sorted([d for n, d in g.degree()], reverse=True)
degreeCount = collections.Counter(degree_sequence)
deg, cnt = zip(*degreeCount.items())
fig, ax = plt.subplots()
plt.bar(deg, cnt, width=0.80, color='b')
plt.title("Degree Histogram for Facebook Data")
plt.ylabel("Count")
plt.xlabel("Degree")
ax.set_xticks([d + 0.4 for d in deg])
ax.set_xticklabels(deg)
plt.show()

plt.loglog(degree_sequence, 'b-', marker='o')
plt.title("Degree rank plot")
plt.ylabel("Degree")
plt.xlabel("Rank")
plt.show()

Upvotes: 1

Views: 2438

Answers (1)

Bonlenfum
Bonlenfum

Reputation: 20155

You seem to be on the right tracks, but some simplifications will likely help you. The code below uses only 2 libraries.

Without access your graph, we can use some graph generators instead. I've chosen 2 qualitatively different types here, and deliberately chosen different sizes so that the normalization of the histogram is needed.

import networkx as nx
import matplotlib.pyplot as plt

g1 = nx.scale_free_graph(1000, ) 
g2 = nx.watts_strogatz_graph(2000, 6, p=0.8) 

# we don't need to sort the values since the histogram will handle it for us
deg_g1 = nx.degree(g1).values()
deg_g2 = nx.degree(g2).values()
# there are smarter ways to choose bin locations, but since
# degrees must be discrete, we can be lazy...
max_degree = max(deg_g1 + deg_g2)

# plot different styles to see both
fig = plt.figure()
ax = fig.add_subplot(111)
ax.hist(deg_g1, bins=xrange(0, max_degree), density=True, histtype='bar', rwidth=0.8)
ax.hist(deg_g2, bins=xrange(0, max_degree), density=True, histtype='step', lw=3) 

# setup the axes to be log/log scaled
ax.set_yscale('log')
ax.set_xscale('log')
ax.set_xlabel('degree')
ax.set_ylabel('relative density')
ax.legend()

plt.show()

This produces an output plot like this (both g1,g2 are randomised so won't be identical):

degree distributions

Here we can see that g1 has an approximately straight line decay in the degree distribution -- as expected for scale-free distributions on log-log axes. Conversely, g2 does not have a scale-free degree distribution.

To say anything more formal, you could look at the toolboxes from Aaron Clauset: http://tuvalu.santafe.edu/~aaronc/powerlaws/ which implement model fitting and statistical testing of power-law distributions.

Upvotes: 1

Related Questions