ely
ely

Reputation: 77484

MySQL: Using IF statement to pivot, but getting lots of NULL values

I have some data organized by quintile labels (-1, 1, 2, 3, 4, 5). For each of these values in a Quintile column, there is a value in another column called ret. Lastly, there is a column of dates containing month-end dates as integers.

My goal is to visualize all of the Quintile returns data at the same time, each as its own column, with only the date column acting like an index.

Essentially, I want to pivot on the Quintile column and I have seen other places advising the use of IF statements in MySQL as a way to achieve this.

For example, here is a query that would show one Quintile's worth of the data:

select yearmonth, ret
where Quintile=1
from quintile_returns

But I don't want to repeat this for all Quintile labels, save out the data individually, and piece it together in Python Pandas or Excel or something. I want to make SQL show it as distinct columns.

But when I try this IF statement style poor man's pivot, this is the query I use:

select yearmonth, 
       IF(Quintile=1, ret, NULL) as Q1_ret,
       IF(Quintile=2, ret, NULL) as Q2_ret
from quintile_returns

I basically get a diagonal of valid data back. All the rows where the Quintile is not 1 still show up, populated with NULL, and then so on for Quintile 2.

How do I avoid all of these extra NULL values? Basically, I want to tell SQL to return the column's value only if the condition is satisfied, and do not use NULL or anything else as a default else-like placeholder.

Is there a way to do this that does not involve nested join-type statements?

Upvotes: 2

Views: 1438

Answers (2)

MvG
MvG

Reputation: 60988

As you want to have only one row of output for multiple rows of input data, you have to aggregate your values. In this case you want to group them by yearmonth. One possible (though not particularly portable) way would be the following:

SELECT yearmonth
     , SUM((Quintile=1)*ret) AS Q1
     , SUM((Quintile=2)*ret) AS Q2
FROM quintile_returns
GROUP BY yearmonth

This slightly hackish approach makes use of the fact that a comparison like Quintile=1 in MySQL yields an integer, 0 for false and 1 for true. So you take 1*ret=ret for a matching Quintile, and 0*ret=0 for others. If you want things to be clearer and more portable, you could also write this as

SELECT yearmonth
     , SUM(IF(Quintile=1, ret, 0)) AS Q1
     , SUM(IF(Quintile=2, ret, 0)) AS Q2
FROM quintile_returns
GROUP BY yearmonth

Upvotes: 2

Anton
Anton

Reputation: 4018

You can use GROUP BY to only show one row for each yearmonth value, and then SUM() along with your IF() statements so that the ret VALUES are only summed when the columns' IF() condition evaluates to TRUE:

SELECT `yearmonth`,
    SUM(IF(`Quintile` = 1, ret, NULL)) as `Q1_ret`,
    SUM(IF(`Quintile` = 2, ret, NULL)) as `Q2_ret`
FROM `quintile_returns`
GROUP BY `yearmonth`

Otherwise, you had the right idea with the IF() statements.

Upvotes: 1

Related Questions