fitting curve: which model to describe distribution in weighted knowledge graphs

Question

As a simple model to represent a knowledge network and learn about properties of weighted graphs, I computed the cosine similarity between Wikipedia articles.

I am looking now at the distribution of the similarity weights for each article (see pictures ).

In the pictures, you see that the curve changes derivative around a certain value (maybe from an exponential, to linear) : I would like to fit the curve and extract that value, where the derivate visibly (or expectedly) change, so that I can divide similar articles in two sets: the "most similar" (left side of the threshold) and the "others" (right side of the threshold).

I want to fit the curve for each article distribution; compare the distribution respect to the mean distribution of all the articles; compare the distribution respect to the distribution of a random weighted network. (You're suggestions are most welcome in defining working procedure: you know I would like to use this model as a toy model to then train how a network, or an article, may evolve in time).

My background is User Experience with a twist for data science, I wish to comprehend better which model may describe the distribution of values I observed, a proper way to compare distributions, and python tools (or Mathematica 11) to fit the curve and obtain the derivative for each point.

which model do you suggest to describe distribution of observed values for similarity between objects in a weighted network (here, a collaborative knowledge base is represented as a weighted network, where weight is the similarity value of two given articles - should I expect an exponential? a poissonian ? why ?)
how to compute curve fit and extract derivative of the curve at a given point (python or Mathematica 11)

fitting curve: which model to describe distribution in weighted knowledge graphs

Answers (1)

Related Questions