M.Naro
M.Naro

Reputation: 77

How to get the most frequent value SQL

I have a table Orders(id_trip, id_order), table Trip(id_hotel, id_bus, id_type_of_trip) and table Hotel(id_hotel, name).

I would like to get name of the most frequent hotel in table Orders.

SELECT hotel.name from Orders
 JOIN Trip
 on Orders.id_trip = Trip.id_hotel
 JOIN hotel
 on trip.id_hotel = hotel.id_hotel
  FROM (SELECT hotel.name, rank() over (order by cnt desc) rnk
          FROM (SELECT hotel.name, count(*) cnt
                  FROM Orders
                 GROUP BY hotel.name))
 WHERE rnk = 1;

Upvotes: 3

Views: 13026

Answers (4)

luis.espinal
luis.espinal

Reputation: 10519

** Getting the most recent statistical mode out of a data sample **

I know it's more than a year, but here's my answer. I came across this question hoping to find a simpler solution than what I know, but alas, nope.

I had a similar situation where I needed to get the mode from a data sample, with the requirement to get the mode of the most recently inserted value if there were multiple modes.

In such a case neither the STATS_MODE nor the LAST aggregate functions would do (as they would tend to return the first mode found, not necessarily the mode with the most recent entries.)

In my case it was easy to use the ROWNUM pseudo-column because the tables in question were performance metric tables that only experienced inserts (not updates)

In this oversimplified example, I'm using ROWNUM - it could easily be changed to a timestamp or sequence field if you have one.

 SELECT     VALUE
       FROM
        (SELECT     VALUE        ,
                COUNT( * ) CNT,
                MAX( R ) R
               FROM
                ( SELECT ID, ROWNUM R FROM FOO
                )
           GROUP BY ID
           ORDER BY CNT DESC,
                R DESC
        )
      WHERE
        (
            ROWNUM < 2
        );

That is, get the total count and max ROWNUM for each value (I'm assuming the values are discrete. If they aren't, this ain't gonna work.)

Then sort so that the ones with largest counts come first, and for those with the same count, the one with the largest ROWNUM (indicating most recent insertion in my case).

Then skim off the top row.

Your specific data model should have a way to discern the most recent (or the oldest or whatever) rows inserted in your table, and if there are collisions, then there's not much of a way other than using ROWNUM or getting a random sample of size 1.

If this doesn't work for your specific case, you'll have to create your own custom aggregator.

Now, if you don't care which mode Oracle is going to pick (your bizness case just requires a mode and that's it, then STATS_MODE will do fine.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269493

Here is one method:

select name
from (select h.name,
             row_number() over (order by count(*) desc) as seqnum  -- use `rank()` if you want duplicates
      from orders o join
           trip t 
           on o.id_trip = t.id_trip join -- this seems like the right join condition
           hotels h
           on t.id_hotel = h.id_hotel
     ) oth
where seqnum = 1;

Upvotes: 0

user5683823
user5683823

Reputation:

The "most frequently occurring value" in a distribution is a distinct concept in statistics, with a technical name. It's called the MODE of the distribution. And Oracle has the STATS_MODE() function for it. https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions154.htm

For example, using the EMP table in the standard SCOTT schema, select stats_mode(deptno) from scott.emp will return 30 - the number of the department with the most employees. (30 is the department "name" or number, it is NOT the number of employees in that department!)

In your case:

select stats_mode(h.name) from (the rest of your query)

Note: if two or more hotels are tied for "most frequent", then STATS_MODE() will return one of them (non-deterministic). If you need all the tied values, you will need a different solution - a good example is in the documentation (linked above). This is a documented flaw in Oracle's understanding and implementation of the statistical concept.

Upvotes: 11

Lukas Eder
Lukas Eder

Reputation: 220762

Use FIRST for a single result:

SELECT MAX(hotel.name) KEEP (DENSE_RANK FIRST ORDER BY cnt DESC) 
FROM (
  SELECT hotel.name, COUNT(*) cnt
  FROM orders
  JOIN trip USING (id_trip)
  JOIN hotel USING (id_hotel)
  GROUP BY hotel.name
) t

Upvotes: 1

Related Questions