jeandemeusy
jeandemeusy

Reputation: 251

Finding outliers in circular data

I have a set of data that has a circular scale (angles from 0 to 360°). I know most of the values in the dataset are close to each other, but some are outliers. I want to determine which of them have to be eliminated.

The problem with circular scale is the following (using an example): data = [350, 0, 10] is an array containing angles in degrees. The absolute mean of this array is 123.33. But considering their units, the mean value of 350°, 0° and 10° is 0°.

We see here that on the mean value there is a problem. The problem also exists when computing the standard deviation.

How do I do it?

Upvotes: 2

Views: 426

Answers (3)

hpchavaz
hpchavaz

Reputation: 1388

Circular mean

You can substitute the vectors to the corresponding points on the unit radius circle to the angles, then define the mean as the angle of the sum of the vectors.

But beware this gives a mean of 26.5° for [0°, 0°, 90°] as 26.5° = arctan(1/2) and there is no mean for [0°, 180°].

Outliers

Outliers are the angles the farther from the mean, which is the greater absolute value of the difference of angles.

Standard deviation

The standard deviation can be use to define outliers.

@coproc gives the corresponding code in its answer.

Interquartiles value

The interquartiles value can also be used, it is less dependable on outliers values than the standard deviation but in the circular case it could be irrelevant.

Anyway :

from functools import reduce
from math import degrees, radians, sin, cos, atan2, pi


def norm_angle(angle, degree_unit = True):
    """ Normalize an angle return in a value between ]180, 180] or ]pi, pi]."""
    mpi = 180 if degree_unit else pi
    angle = angle % (2 * mpi)
    return angle if abs(angle) <= mpi else angle - (1 if angle >= 0 else -1) * 2 * mpi


def circular_mean(angles, degree_unit = True):
    """ Returns the circular mean from a collection of angles. """
    angles = [radians(a) for a in angles] if degree_unit else angles
    x_sum, y_sum = reduce(lambda tup, ang: (tup[0]+cos(ang), tup[1]+sin(ang)), angles, (0,0))
    if x_sum == 0 and y_sum == 0: return None
    return (degrees if degree_unit else lambda x:x)(atan2(y_sum, x_sum)) 


def circular_interquartiles_value(angles, degree_unit = True):
    """ Returns the circular interquartiles value from a collection of angles."""
    mean = circular_mean(angles, degree_unit=degree_unit)
    deltas = tuple(sorted([norm_angle(a - mean, degree_unit=degree_unit) for a in angles]))

    nb = len(deltas)
    nq1, nq3, direct = nb // 4, nb - nb // 4, (nb % 4) // 2

    q1 = deltas[nq1] if direct else (deltas[nq1-1] + deltas[nq1]) / 2
    q3 = deltas[nq3-1] if direct else(deltas[nq3-1] + deltas[nq3]) / 2

    return q3-q1


def circular_outliers(angles, coef = 1.5, values=True, degree_unit=True):
    """ Returns outliers from a collection of angles. """
    mean = circular_mean(angles, degree_unit=degree_unit)
    maxdelta = coef * circular_interquartiles_value(angles, degree_unit=degree_unit)
    deltas = [norm_angle(a - mean, degree_unit=degree_unit) for a in angles]

    return [z[0] if values else i for i, z in enumerate(zip(angles, deltas)) if abs(z[1]) > maxdelta]

Lets give it a try:

angles = [-179, -20, 350, 720, 10, 20, 179] # identical to [-179, -20, -10, 0, 10, 20, 179]
circular_mean(angles), circular_interquartiles_value(angles), circular_outliers(angles)

output:

(-1.1650923760388311e-14, 40.000000000000014, [-179, 179])

As we might expect:

  • the circular_mean is near 0 as the list is symetric for the 0° axis;
  • the circular_interquartiles_value is 40° as the first quartile is -20° and the third quartile is 20°;
  • the outliers are correctly detected, 350 and 720 been taken for their normalized values.

Upvotes: 0

Dan Nagle
Dan Nagle

Reputation: 5425

If you immediately convert the angle data (0..360) using either Sine or Cosine functions you transform the data into the range -1.0, 1.0.

In doing so you lose the information related to the quadrant that the angle was found in so you need to extract that information.

quadrant = [n // 90 for n in data] # values: 0, 1, 2, 3

You can fold the quadrants into one and the Sine or Cosine transform of the result will be in the range 0.0, 1.0.

single_quadrant = [n % 90 for n in data] # values: 0, 1, ..., 89

Using both of these two ideas it's possible to map data to the range 0.0 - 4.0 using either of the Sine or Cosine functions like so:

import math

using_sine = [(n//90 + math.sin(math.radians(n % 90))) for n in data]

using_cosine = [(n//90 + math.cos(math.radians(n % 90))) for n in data]

Upvotes: 0

coproc
coproc

Reputation: 6247

So you are given a list of angles and want to find the "mean" (average) angle and outliers. One simple possibility is to average the 2D vectors (cos(a),sin(a)) corresponding to the angles and compute the std deviation on the angles again:

from math import degrees, radians, sin, cos, atan2

def absDiff_angle(a1, a2, fullAngle=360):
    a1,a2 = a1%fullAngle,a2%fullAngle
    if a1 >= a2: a1,a2 = a2,a1
    return min(a2-a1, a1+fullAngle-a2)

# sample input of angles 350,351,...359,0,...,10, 90
angles_deg = list(range(350,360)) + list(range(11)) + [90]

# compute corresponding 2D vectors
angles_rad = [radians(a) for a in angles_deg]
xVals = [cos(a) for a in angles_rad]
yVals = [sin(a) for a in angles_rad]

# average of 2D vectors
N = len(angles_rad)
xMean = sum(xVals)/N
yMean = sum(yVals)/N

# go back to angle
angleMean_rad = atan2(yMean,xMean)
angleMean_deg = degrees(angleMean_rad)

# filter outliers
square = lambda v: v*v
stddev = sqrt(sum([square(absDiff_angle(a, angleMean_deg)) for a in angles_deg])/(N-1))
MIN_DIST_OUTLIER = 3*stddev
isOutlier = lambda a: absDiff_angle(a, angleMean_deg) >= MIN_DIST_OUTLIER
outliers = [a for a in angles_deg if isOutlier(a)]

print(angleMean_deg)
print(outliers)

Note, that outliers can distort the mean value and std deviation. To be less sensitive to outliers one can compute a histogram of the angles (for, e.g., the bins [0°, 10°[, [10°, 20°[, ..., [350°,360°[) and select the angles from the bin with most members and neighbours of it for computing the mean angle (and std deviation).

Upvotes: 1

Related Questions