TheLettuceMaster
TheLettuceMaster

Reputation: 15734

Finding Most Frequent Values in a Table Associated with a Parent Id Column in Another

I have two tables being joined. They have a one-to-many relationship. I would like to make a query that pulls the most frequent of the "many" table for each of the parent id's.

I have a query like this that is in the beginning phases of what I am trying to do:

SELECT  p.profile, up.value, COUNT(*) AS theCount FROM `profiles` AS p
JOIN user_profile AS up ON p.id = up.profile_id
GROUP BY `profile`, `value`
ORDER BY p.profile ASC, theCount DESC;

Now the problem with this query is that it will show all values from the many table, and just group them with most to least. For example:

If table profile has values of val 1,val 2 and table user_profiles has values of Apple, Apple, Orange with a parent_id of val 1, and Pear for val 2 in the first table, the query above will group them like this:

val 1, Apple, 2

val 1, Orange, 1

val 2, Pear, 1

Now what I WANT is this:

val 1, Apple, 2

val 2, Pear, 1

I only want to show the highest value for the parent. Or if the case there is only one value (e.g. val 2) then show that.

Now, I can easily do this in a subquery (Do a Limit 1 for each parent value; order by theCount DESC). However, is there (1) a way to do this without a subquery? or (2) a MySQL function or some other way to do this? I am trying hard to keep these queries very quick and high performance.

If you can show me that a subquery will not hinder performance here, I will select that as an answer as well.

Upvotes: 1

Views: 65

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270713

With this type of problem, I think the group_concat()/substring_index() trick is the easiest method in MySQL:

SELECT profile, substring_index(group_concat(up.value order by theCount desc), ',', 1) as MostCommonValue,
       theCount
FROM (SELECT  p.profile, up.value, COUNT(*) AS theCount
      FROM `profiles` p JOIN
            user_profile up 
            ON p.id = up.profile_id
      GROUP BY `profile`, `value`
     ) pv
GROUP BY profile
ORDER BY p.profile ASC, theCount DESC;

This method has short-comings. For instance, if value can contain a comma, you need a different separator. More importantly, there is a maximum length to the group_concat() intermediate value (but you set the parameter to have a larger value). But, it works quite well in many cases.

Upvotes: 2

Related Questions