ou812
ou812

Reputation: 93

Does this require recursive CTE, just creative window functions, a loop?

I cannot for the life of me figure out how to get a weighted ranking for scores across X categories. For example, the student needs to answer 10 questions across 3 categories (both # of questions and # of categories will be variable eventually). To get a total score the top 1 score in each of the X (3) categories will be added to whatever is left to add up to 10 total question scores.

Here is the data. I used a CASE WHEN Row_Number() to get the TopInCat

http://sqlfiddle.com/#!6/e6e9f/1

The fiddle has more students.

| Question | Student | Category | Score | TopInCat |
|----------|---------|----------|-------|----------|
|   120149 |     125 |        6 | 1     |        1 |
|   120127 |     125 |        6 | 0.9   |        0 |
|   120124 |     125 |        6 | 0.8   |        0 |
|   120125 |     125 |        6 | 0.7   |        0 |
|   120130 |     125 |        6 | 0.6   |        0 |
|   120166 |     125 |        6 | 0.5   |        0 |
|   120161 |     125 |        6 | 0.4   |        0 |
|   120138 |     125 |        4 | 0.15  |        1 |
|   120069 |     125 |        4 | 0.15  |        0 |
|   120022 |     125 |        4 | 0.15  |        0 |
|   120002 |     125 |        4 | 0.15  |        0 |
|   120068 |     125 |        2 | 0.01  |        1 |
|   120050 |     125 |        3 | 0.05  |        1 |
|   120139 |     125 |        2 | 0     |        0 |
|   120156 |     125 |        2 | 0     |        0 |

This is how I envision it needs to look, but it doesn't have to be exactly this. I just need to have 10 questions by 3 categories detail data in a way that would allow me to sum and average the Sort 1-10 column below. The 999's could be null or whatever as long as I can sum whats important and present the details.

| Question | Student | Category | Score | TopInCat | Sort |
|----------|---------|----------|-------|----------|------|
|   120149 |     125 |        6 | 1     |        1 |    1 |
|   120138 |     125 |        4 | 0.15  |        1 |    2 |
|   120068 |     125 |        2 | 0.01  |        1 |    3 |
|   120127 |     125 |        6 | 0.9   |        0 |    4 |
|   120124 |     125 |        6 | 0.8   |        0 |    5 |
|   120125 |     125 |        6 | 0.7   |        0 |    6 |
|   120130 |     125 |        6 | 0.6   |        0 |    7 |
|   120166 |     125 |        6 | 0.5   |        0 |    8 |
|   120161 |     125 |        6 | 0.4   |        0 |    9 |
|   120069 |     125 |        4 | 0.15  |        0 |   10 |
|   120022 |     125 |        4 | 0.15  |        0 |  999 |
|   120002 |     125 |        4 | 0.15  |        0 |  999 |
|   120050 |     125 |        3 | 0.05  |        1 |  999 |
|   120139 |     125 |        2 | 0     |        0 |  999 |
|   120156 |     125 |        2 | 0     |        0 |  999 |

One last thing, the category no longer matters once the X (3) threshold is met. So a 4th category would just sort normally.

| Question | Student | Category | Score | TopInCat | Sort |
|----------|---------|----------|-------|----------|------|
|   120149 |     126 |        6 | 1     |        1 |    1 |
|   120138 |     126 |        4 | 0.75  |        1 |    2 |
|   120068 |     126 |        2 | 0.50  |        1 |    3 |
|   120127 |     126 |        6 | 0.9   |        0 |    4 |
|   120124 |     126 |        6 | 0.8   |        0 |    5 |
|   120125 |     126 |        6 | 0.7   |        0 |    6 |
|   120130 |     126 |        6 | 0.6   |        0 |    7 |
|   120166 |     126 |        6 | 0.5   |        0 |    8 |
|   120050 |     126 |        3 | 0.45  |        1 |    9 |********
|   120161 |     126 |        6 | 0.4   |        0 |   10 |
|   120069 |     126 |        4 | 0.15  |        0 |  999 |
|   120022 |     126 |        4 | 0.15  |        0 |  999 |
|   120002 |     126 |        4 | 0.15  |        0 |  999 |
|   120139 |     126 |        2 | 0     |        0 |  999 |
|   120156 |     126 |        2 | 0     |        0 |  999 |

I really appreciate any help. Been banging my head on this for a few days.

Upvotes: 4

Views: 89

Answers (1)

AakashM
AakashM

Reputation: 63378

With such matters I like to proceed with a 'building blocks' approach. Following the maxim of first make it work, then if you need to make it fast, this first step is often enough.

So, given

CREATE TABLE WeightedScores
    ([Question] int, [Student] int, [Category] int, [Score] dec(3,2))
;

and your sample data

INSERT INTO WeightedScores
    ([Question], [Student], [Category], [Score])
VALUES
    (120161, 123, 6, 1),    (120166, 123, 6, 0.64),    (120138, 123, 4, 0.57),    (120069, 123, 4, 0.5),
    (120068, 123, 2, 0.33),    (120022, 123, 4, 0.18),    (120061, 123, 6, 0),    (120002, 123, 4, 0),
    (120124, 123, 6, 0),    (120125, 123, 6, 0),    (120137, 123, 6, 0),    (120154, 123, 6, 0),
    (120155, 123, 6, 0),   (120156, 123, 6, 0),    (120139, 124, 2, 1),    (120156, 124, 2, 1),
    (120050, 124, 3, 0.88),    (120068, 124, 2, 0.87),    (120161, 124, 6, 0.87),    (120138, 124, 4, 0.85),
    (120069, 124, 4, 0.51),    (120166, 124, 6, 0.5),    (120022, 124, 4, 0.43),    (120002, 124, 4, 0),
    (120130, 124, 6, 0),    (120125, 124, 6, 0),    (120124, 124, 6, 0),    (120127, 124, 6, 0),
    (120149, 124, 6, 0),    (120149, 125, 6, 1),    (120127, 125, 6, 0.9),    (120124, 125, 6, 0.8),
    (120125, 125, 6, 0.7),    (120130, 125, 6, 0.6),    (120166, 125, 6, 0.5),    (120161, 125, 6, 0.4),
    (120138, 125, 4, 0.15),    (120069, 125, 4, 0.15),    (120022, 125, 4, 0.15),    (120002, 125, 4, 0.15),
    (120068, 125, 2, 0.01),    (120050, 125, 3, 0.05),    (120139, 125, 2, 0),    (120156, 125, 2, 0),
    (120149, 126, 6, 1),    (120138, 126, 4, 0.75),    (120068, 126, 2, 0.50),    (120127, 126, 6, 0.9),
    (120124, 126, 6, 0.8),    (120125, 126, 6, 0.7),    (120130, 126, 6, 0.6),    (120166, 126, 6, 0.5),
    (120050, 126, 3, 0.45),    (120161, 126, 6, 0.4),    (120069, 126, 4, 0.15),    (120022, 126, 4, 0.15),
    (120002, 126, 4, 0.15),    (120139, 126, 2, 0),    (120156, 126, 2, 0)
;

let's proceed.


The complicated part here is identifying the top three top-in-category questions; the others of the ten questions of interest per student are simply sorted by score, which is easy. So let's start with identifying the top three top-in-category questions.

First, assign to each row a row number giving the ordering of that score within the category, for the student:

;WITH Numbered1 ( Question, Student, Category, Score, SeqInStudentCategory ) AS
(
    SELECT Question, Student, Category, Score
        , ROW_NUMBER() OVER (PARTITION BY Student, Category ORDER BY Score DESC) SeqInStudentCategory 
    FROM WeightedScores
)

Now we are only interested in rows where SeqInStudentCategory is 1. Considering only such rows, let's order them by score within student, and number those rows:

-- within the preceding WITH
, Numbered2 ( Question, Student, Category, Score, SeqInStudent ) AS
(
    SELECT 
        Question, Student, Category, Score
        , ROW_NUMBER() OVER (PARTITION BY Student ORDER BY Score DESC) SeqInStudent
    FROM
        Numbered1
    WHERE
        SeqInStudentCategory = 1
)

Now we are only interested in rows where SeqInStudent is at most 3. Let's pull them out, so that we know to include it (and exclude it from the simple sort by score, that we will use to make up the remaining seven rows):

-- within the preceding WITH
, TopInCat ( Question, Student, Category, Score, SeqInStudent ) AS
(
     SELECT Question, Student, Category, Score, SeqInStudent FROM Numbered2 WHERE SeqInStudent <= 3
)

Now we have the three top-in-category questions for each student. We now need to identify and order by score the not top-in-category questions for each student:

-- within the preceding WITH
, NotTopInCat ( Question, Student, Category, Score, SeqInStudent ) AS
(
    SELECT
        Question, Student, Category, Score
        , ROW_NUMBER() OVER (PARTITION BY Student ORDER BY Score DESC) SeqInStudent
    FROM
        WeightedScores WS
    WHERE
        NOT EXISTS ( SELECT 1 FROM TopInCat T WHERE T.Question = WS.Question AND T.Student = WS.Student )
)

Finally we combine TopInCat with NotTopInCat, applying an appropriate offset and restriction to NotTopInCat.SeqInStudent - we need to add 3 to the raw value, and take the top 7 (which is 10 - 3):

-- within the preceding WITH
, Combined ( Question, Student, Category, Score, CombinedSeq ) AS
(
    SELECT
        Question, Student, Category, Score, SeqInStudent AS CombinedSeq
    FROM
        TopInCat
    UNION
    SELECT
        Question, Student, Category, Score, SeqInStudent + 3 AS CombinedSeq
    FROM
        NotTopInCat
    WHERE
        SeqInStudent <= 10 - 3
)

To get our final results:

SELECT * FROM Combined ORDER BY Student, CombinedSeq
;

You can see the results on sqlfiddle.


Note that here I have assumed that every student will always have answers from at least three categories. Also, the final output doesn't have a TopInCat column, but hopefully you will see how to regain that if you want it.

Also, "(both # of questions and # of categories will be variable eventually)" should be relatively straightforward to deal with here. But watch out for my assumption that (in this case) 3 categories will definitely be present in the answers of each student.

Upvotes: 2

Related Questions