Asha
Asha

Reputation: 4311

sql logical compression of records

I have a table in SQL with more than 1 million records which I want to compress using following algorithm ,and now I'm looking for the best way to do that ,preferably without using a cursor .

if the table contains all 10 possible last digits(from 0 to 9) for a number (like 252637 in following example) we will find the most used Source (in our example 'A') and then remove all digits where Source = 'A' and insert the collapsed digit instead of that (here 252637) . the example below would help for better understanding.

Original table :

Digit(bigint)| Source
             | 
2526370      | A
2526371      | A
2526372      | A
2526373      | B
2526374      | C
2526375      | A
2526376      | B
2526377      | A
2526378      | B
2526379      | B

Compressed result :

252637       |A
2526373      |B
2526374      |C
2526376      |B
2526378      |B
2526379      |B

Upvotes: 1

Views: 162

Answers (2)

sheikhjabootie
sheikhjabootie

Reputation: 7376

This is just another version of Tom Morgan's accepted answer. It uses division instead of substring to trim the least significant digit off the BIGINT digit column:

SELECT     
    t.Digit/10      
    (
        -- Foreach t, get the Source character that is most abundant (statistical mode).
        SELECT TOP 1 
            Source         
        FROM 
            table i         
        WHERE 
            (i.Digit/10) = (t.Digit/10)         
        GROUP BY 
            i.Source         
        ORDER BY 
            COUNT(*) DESC     
    )  
FROM 
    table t     
GROUP BY 
    t.Digit/10  
HAVING 
    COUNT(*) = 10 

I think it'll be faster, but you should test it and see.

Upvotes: 2

Tom Morgan
Tom Morgan

Reputation: 2365

You could identify the rows which are candidates for compression without a cursor (I think) by GROUPing by a substring of the Digit (the length -1) HAVING count = 10. That would identify digits with 10 child rows. You could use this list to insert to a new table, then use it again to delete from the original table. What would be left would be rows that don't have all 10, which you'd also want to insert to the new table (or copy the new data back to the original).

Does that makes sense? I can write it out a bit better if it doesn't.

Possible SQL Solution:

SELECT
    SUBSTRING(t.Digit,0,len(t.Digit)-1)

    (SELECT TOP 1 Source
        FROM innerTable i
        WHERE SUBSTRING(i.Digit,0,len(i.Digit)-1)
                = SUBSTRING(t.Digit,0,len(t.Digit)-1)
        GROUP BY i.Source
        ORDER BY COUNT(*) DESC
    )

FROM table t
    GROUP BY SUBSTRING(t.Digit,0,len(t.Digit)-1)
HAVING COUNT(*) = 10

Upvotes: 0

Related Questions