Ocean Scientist
Ocean Scientist

Reputation: 411

Calculate linear percentage difference

I have two related datasets, with one of them that can be slightly below 0.

I am trying to calculate the 'linear' percent difference between the two.

I have written some example code: perc[1] is the proper percentage method, however in the last two example cases, the percentage differences are not 'linear' for both sides (ie -87 compared to 700)- I need them to be equal or have some linearity in their calculation, whereas the last three methods are linear I guess. I like perc[3] as it is just the absolute difference and amplified, but kind of unitless. Maybe [4] or [5] is the most accurate for this use case, using the mean of the two as the denominator?

1, 8: [12.5, -87.5, 700.0, -700, -155.55555555555557, 155.55555555555557]
8, 1: [800.0, 700.0, -87.5, 700, 155.55555555555557, -155.55555555555557]

Statistics is not my strong point. Can anyone provide rational why I should use either [4] or [5]. I know that 0,1,2 and 3 are probably not the correct choices here

import numpy as np
import matplotlib.pyplot as plt

def perc_calc(x,y):
    perc0=(x/y)*100 #Original one i used but is non-linear
    perc1=((x-y)/(y))*100   #Proper % method but still nonlinear
    perc2=((y-x)/x)*100    #Depends if use x or y
    perc3=(x-y)*100       #Just amplifying the real difference
    perc4=(x-y)/((x+y)/2)*100  #Difference by the mean 
    perc5=(y-x)/((x+y)/2)*100  #Opposite difference by the mean
    return [perc0,perc1,perc2,perc3,perc4,perc5]


x=np.random.uniform(-0.005, 1, size=600)
y=np.random.uniform(0.005,1,size=600)

plt.plot(perc_calc(x,y)[3])
plt.show()

plt.plot(perc_calc(x,y)[4])
plt.show()

def example(x,y):
    print(str(x)+', '+str(y)+': '+str(perc_calc(x,y)))
#Example Cases:
example(5,10)
example(-1,10)
example(1,8)
example(8,1)

Upvotes: 1

Views: 412

Answers (1)

Kevin Languasco
Kevin Languasco

Reputation: 2426

Referencing this Wikipedia article, the relative percentage is, in general, of the form

|x - y| / |f(x, y)|

The absolute value in |x - y| can be removed if you have a reference point, so as to get negative percentages. If that doesn't make sense to you, you should keep it.

The function |f(x,y)| is what is commonly called the scaling factor. You can choose between many options here, and it depends on the application.

You can take just f(x, y) = y, as you did in (1). This is usually done when comparing experimental and theoretical values, say, after measuring in some experiment; or when measuring the change with respect to a past state. But note that it needs a reference point (the theoretical value, or the value before the change was done) and it won't have the "linearity" property you are looking for, since your scaling factors change when swapping x and y (1/2 vs 1 if we use 1 and 2). This is because, for a value of 2, suddenly changing to 10 means a 400% increase, but a 10 becoming a 2 is a 80% decrease.

So you need some function f that doesn't change on swapping the parameters. This is known in mathematics as a symmetric function. Many examples are shown in the article referenced before. I suggest (|x| + |y|)/2, but try out the others to see what makes more sense.

    perc6 = abs(x-y) / ((abs(x)+abs(y)) / 2) * 100

To test by plotting, try fixing a value for y, say 10, and do a scatterplot of x vs perc_calc(x, 10)[6].

Upvotes: 1

Related Questions