Metric for calculating uniformity with penalty for lower values

Question

In my data, I have multiple bill-dates and multiple items and each item is sold with a different amount each day.

I am looking for a metric to incorporate two things for this data.

One is measuring how nearby or far apart dates on which a purchase for a particular item was made are: [0,1,0,0,0,0,1,1] is more uniform and [0,1,1,1,0,0,0,0] is less uniform where 1 represents a purchase was made on that day and 0 indicates purchase wasn't made.

Note I have many items like these, So I need a metric to arrange these items in order.

Penalty for days when no purchase is made.

My final aim is to have a metric so that distribution of purchase on purchase dates are maximized and low number of total days purchased are penalized.

Now I tried two methods for this:

wasserstein_distance also known as earth mover distance. Problem with this metric is it gives same value for wasserstein_distance([0,1,0,0,0,0,1,1], [1,1,1,1,1,1,1,1]) and wasserstein_distance([0,1,1,1,0,0,0,0], [1,1,1,1,1,1,1,1]). Also it doesn't penalizes presence of too many zeros.
Entropy: Same problem of penalization.

Note I am also ready to incorporate instead the array of total quantity sold each day instead of a binary representation like the above.

Metric for calculating uniformity with penalty for lower values

Answers (1)

Related Questions