Brad Stewart
Brad Stewart

Reputation: 119

Select duplicates/repeating values, ungrouped

I am trying to return a list of ungrouped values having a max defined number of repeating values. I have a list of values retrieved as such:

select TagDirID from tags where id = '550'

The results would be:

9508
10382
10672
65454
65454
65454
65454

As you can see there are 4 repeats of 65454. I would like to return a list that has a user defined max number of repeats for a TagDirID. For instance, selecting only 3 repeats or less:

9508
10382
10672
65454
65454
65454

All the methods I've found return a grouped list, i would like to maintain the individual items. Is this something that can be done in a query? There is a primary key available, TagID.

edit: what this does is select all the tags for an item, id = '550'. because it's user content sometimes people tag the same things multiple times and i'm trying to cut down the duplicates i show.

edit 2: so, while the accepted answer worked for me i found it was a little too slow for what i needed so i came up with a php solution:

function get_tags($ID = '', $tags_to_keep = 3)
{   
    // Select all tags.
    $query = "select TagDirID, Tag from tags where id = '$ID'";

    $tags_result = mysql_query($query); 

    $num_results = mysql_num_rows($tags_result);

    for ($i=0; $i<$num_results; $i++)
    {
        //Get tag topics
        $tags_row = mysql_fetch_array($tags_result);

        //build array of items already found with counts
        $tags_count = array_count_values($tags_filter);

        //if number of tags already found($tags_count/$tags_filter) is less than or equal to tags_to_keep then add to filtered array and return array.
        if($tags_count[$tags_row['TagDirID']] <= $tags_to_keep)
        {
            $tags_filter[$i] = $tags_row['TagDirID'];
            $tags[$i] = $tags_row['Tag'];
        }           
    }

    return $tags;
}

Upvotes: 1

Views: 463

Answers (2)

Thomas
Thomas

Reputation: 64635

Select TagID, TagDirID
From    (
        Select T1.TagID, T1.TagDirID
             , (Select Count(*)
                From tags As T2
                Where T2.TagDirID = T1.TagDirID
                    And T2.TagID < T1.TagID) As Rnk
        Where T1.id = '550'
        From tags As T1
        ) As T
Where T.Rnk < 3

Another way of writing the same query:

Select TagID, TagDirID
From    (
        Select T1.TagID, T1.TagDirID, Count( T2.TagID ) As Rnk
        From tags As T1
            Left Join tags As T2
                On T2.TagDirID = T1.TagDirID
                    And T2.TagID < T1.TagID
        Where T1.id = '550'
        Group By T1.TagID, T1.TagDirID
        ) As T
Where T.Rnk < 3

The approach here is to mimic a ranking function which would sequence the rows for each grouping of TagDirID. Thus, the inner query in either of the two above solutions should give you something like:

TagID | TagDirID | Rnk
1     | 9508     | 0
2     | 10382    | 0
3     | 10672    | 0
4     | 65454    | 0
5     | 65454    | 1
6     | 65454    | 2
7     | 65454    | 3

With the rows numbered within each grouping, we can now filter down our results so that we only get a maximum number of rows in any given group. The ISO/ANSI solution would be to use the ROW_NUMBER ranking function which isn't yet supported by MySQL.

Upvotes: 2

Gabriel
Gabriel

Reputation: 1833

As the repeated values are all equal, you can query for the number of repetions of distinct items. It would be something like this, for 3 items or less:

SELECT T.tdid, T.cnt
FROM ( 
      SELECT distinct(TagDirID) as tdid, 
             (SELECT COUNT(*) FROM tags WHERE id = tdid) as cnt 
      FROM tags
) as T
WHERE T.tdid = '550' AND T.cnt < 3;

So the result would be a little different from what you had (not the duplicate items, but one item and the number of duplications), but I think it will do.

Upvotes: 0

Related Questions