Ace
Ace

Reputation: 869

Statistical Mode between 3 columns

I have a ~70K-entry table for orders, as follows:

Screenshot of table.

I would like to, on a per-customer basis, determine what the most common order is, as well as to what certainty that order is (sample size and probability).

This is what I have so far:

CREATE VIEW CustomerOrderProbabaility as 
SELECT Distinct(customerID)
        customerID,
        order,
        COUNT(*) as sampleSize
FROM (Select customerID, order1 AS order FROM orderTable UNION
      Select customerID, order2 AS order FROM orderTable UNION
      Select customerID, order3 AS order FROM orderTable
     )
GROUP BY customerID, order
ORDER BY customerID, COUNT(*) DESC;

And I'm getting a table of customerIds and order, but sampleSize is always 1. Where am I going wrong?

Upvotes: 0

Views: 26

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269703

I think you want UNION ALL as well as some other changes:

CREATE VIEW CustomerOrderProbabaility as 
    SELECT DISTINCT ON (customerID)
            customerID,
            order,
            COUNT(*) as sampleSize,
            SUM(COUNT(*)) OVER (PARTITION BY customerId) as totOrders
    FROM (Select customerID, order1 AS theorder FROM orderTable UNION ALL
          Select customerID, order2 AS theorder FROM orderTable UNION ALL
          Select customerID, order3 AS theorder FROM orderTable
         ) co
    GROUP BY customerID, theorder
    ORDER BY customerID, COUNT(*) DESC;

UNION removes duplicates.

Changes:

  • Renamed order to theorder. order is a keyword. Even if accepted as a column name, I don't think it is a good idea.
  • UNION ALL instead of UNION so duplicates are not removed.
  • DISTINCT ON instead of DISTINCT, because this is your intention.
  • Added TotOrders to count all orders for each customer.

Upvotes: 1

Related Questions