Smoothness of a dataset with pandas

Question

I have a sample of example of two data as below;

If I was to plot the two, Data A would have a much smoother line graph and data B would have more spikey graph. How can I use pandas to sort of deternimne/differentiate the smoothness of dataset e.g with a calculation on the data and giving it an index which I can equate to the smoothness f the data. I looked for a solution and there was a suggestion using difference of Standard deviation. This was based on R. Any ideas on this? What sort of calculation would give me what i want? Can anyone point me in the right direction?

Esuom · Accepted Answer

Standard deviation doesn't necessarily mean smoothness in the sense you seem to mean. A straight line graph (y=x) A:1 B:2 C:3 D:4 would be smooth for what you mean right? Whereas A:4 B:1 C:3 B:2 would not (it would go up and down/change direction). I think what you are looking for is a change of slope calculation (derivative of the function at different points or gradient).

In this case it's actually quite simple. Just calculate the sum of the absolute difference between each point. The one with the greatest total is more "spikey".

You can shift the data (pandas.shift), subtract the shift from the original, take the absolute value and then the sum.

Smoothness of a dataset with pandas

Answers (1)

Related Questions