Tom Nice
Tom Nice

Reputation: 1

SQL - Field outputted as a proportion by dividing by frequency of another field

I'm trying to show a 'proportion' field for website user data where proportion will be the measure of how much a certain data source contributes to their conversion (whatever that may be, it is irrelevant).

This is the kind of output I'm hoping to achieve with SQL:

Month  | ID | Country | Data Source |Proportion
Jan-15 | 1  | UK      |   GA        |    0.5
Jan-15 | 1  | UK      |   Omniture  |    0.5
Jan-15 | 2  | France  |   GA        |     1
Jan-15 | 3  | Germany |   GA        |    0.25
Jan-15 | 3  | Germany |   Omniture  |    0.25
Jan-15 | 3  | Germany |   Email     |    0.25
Jan-15 | 3  | Germany |   Moz       |    0.25
Feb-15 | 1  | UK      |   GA        |    0.5
Feb-15 | 1  | UK      |   Omniture  |    0.5
Feb-15 | 2  | France  |   Omniture  |    0.5
Feb-15 | 2  | France  |   GA        |    0.5
Feb-15 | 3  | Germany |   Omniture  |    0.33
Feb-15 | 3  | Germany |   Email     |    0.33
Feb-15 | 3  | Germany |   Moz       |    0.33
Mar-15 | 1  | UK      |   Omniture  |    0.5
Mar-15 | 1  | UK      |   GA        |    0.5
Mar-15 | 2  | France  |   Omniture  |    0.5
Mar-15 | 2  | France  |   Email     |    0.5

And this is the SQL that I am currently working and failing with:

SELECT
    MONTH(registrationDate), country, DataSource, 1/COUNT(ID)
FROM
    data_table
WHERE
    registrationDate IS NOT NULL
GROUP BY
    MONTH(registrationDate), ID

This is only giving one instance of the proportion. Using the example above, user with ID 1 in January will only have one record with Proportion = 0.5.

Any help in showing this proportion value correctly shared between data sources would be much appreciated!

Upvotes: 0

Views: 65

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269593

You need to combine the results with the original data. Here is a method using JOIN:

SELECT dt.*, ddt.value
FROM data_table dt JOIN
     (SELECT MONTH(registrationDate) as mon, ID, 
             1.0/COUNT(applicantId) as value
      FROM data_table
      WHERE registrationDate IS NOT NULL
      GROUP BY MONTH(registrationDate), ID
     ) ddt
     ON ddt.id = dt.id AND 
        ddt.mon = MONTH(dt.registrationDate);

Your question has ID, ApplicationId, and RegistrationId sprinkled throughout. I'm not sure what the right column(s) to use are.

EDIT:

To include year (which really is a good idea in all cases):

SELECT dt.*, ddt.value
FROM data_table dt JOIN
     (SELECT YEAR(registrationDate) as yyyy, MONTH(registrationDate) as mon, ID, 
             1.0/COUNT(applicantId) as value
      FROM data_table
      WHERE registrationDate IS NOT NULL
      GROUP BY YEAR(registrationDate), MONTH(registrationDate), ID
     ) ddt
     ON ddt.id = dt.id AND 
        ddt.mon = MONTH(dt.registrationDate) AND
        ddt.yyyy = YEAR(dt.registrationDate);

Upvotes: 1

Related Questions