user305883
user305883

Reputation: 1741

fitting curve: which model to describe distribution in weighted knowledge graphs

As a simple model to represent a knowledge network and learn about properties of weighted graphs, I computed the cosine similarity between Wikipedia articles.

I am looking now at the distribution of the similarity weights for each article (see pictures ).

In the pictures, you see that the curve changes derivative around a certain value (maybe from an exponential, to linear) : I would like to fit the curve and extract that value, where the derivate visibly (or expectedly) change, so that I can divide similar articles in two sets: the "most similar" (left side of the threshold) and the "others" (right side of the threshold).

I want to fit the curve for each article distribution; compare the distribution respect to the mean distribution of all the articles; compare the distribution respect to the distribution of a random weighted network. (You're suggestions are most welcome in defining working procedure: you know I would like to use this model as a toy model to then train how a network, or an article, may evolve in time).

My background is User Experience with a twist for data science, I wish to comprehend better which model may describe the distribution of values I observed, a proper way to compare distributions, and python tools (or Mathematica 11) to fit the curve and obtain the derivative for each point.

enter image description here enter image description here

enter image description here enter image description here


enter image description here

Upvotes: 1

Views: 116

Answers (1)

Mike Pierce
Mike Pierce

Reputation: 1534

Working with Mathematica, suppose you data is in the list data. Then if you want to find the cubic polynomial that best fits your data, use the Fit function:

Fit[data, {1, x, x^2, x^3}, x]

In general the usage for the Fit command looks like

Fit["data set", "list of functions", "independent variable"] 

where Mathematica tries to fit a linear combination of the functions in that list to your data set. I'm not sure what to say about what sort of curve we would expect this data to be best modeled by, but just remember that any smooth function can be approximated to arbitrary precision by a polynomial with sufficiently many terms. So if you have the computational power to spare, just let your list of functions be a long list of powers of x. Although it does look like you have an asymptote at x=0, so maybe allow there to be a 1/x term in there to capture that. And then of course you can use Plot to plot your curve on top of your data to compare them visually.

Now to get this best fit curve as a function in Mathematica that you can take a derivative of:

f[x_] := Fit[data, {1, x, x^2, x^3}, x]

And then the obvious change you are talking about occurs when the second derivative is zero, so to get that x value:

NSolve[f''[x] == 0, x]

Upvotes: 1

Related Questions