limoan
limoan

Reputation: 105

ELKI, the DBOutlierDetection algorithm: What measure is d?

Can you tell me which measure the parameter d is in the DBOutlierDetection algorithm (or DBOutlierScore)? Centimetres? Milimetres?

I have to somehow compare the area under parameter d with LOF's k.

Upvotes: 2

Views: 60

Answers (1)

Erich Schubert
Erich Schubert

Reputation: 8715

It depends on your distance measure that you set with -algorithm.distancefunction.

The parameter is a distance; the sematnic meaning of the distance depends on your data and distance function.

For example, if your data are latitude, longitude pairs

  • Euclidean distance would be in degrees, a rather meaningless value near the poles due to the distorition (one degree at the north pole is virtually nothing, but it is a substantial distance along the equator)
  • Geodetic distance in ELKI uses meters. This is easier to parameterize.

Similar, if you are using Euclidean distance, and your

  • data is in meters, then Euclidean distance is in meters
  • data is in millimeter, then Euclidean distance is in millimeter
  • data is shoe size, weight, height and voltage, then using Euclidean distance does not make much sense, because you are measuring apples and oranges.

You can normalize or standardize the data. For example if you normalize by mean and standard deviation, the unit of the measure disappears. Using Euclidean distance on such data then has the unit of "standard deviations". But that unit also does not make much sense anymore on a multimodal distribution, as it is common in outlier detection and clustering.

Upvotes: 2

Related Questions