Reputation: 77484
I have some data organized by quintile labels (-1, 1, 2, 3, 4, 5). For each of these values in a Quintile
column, there is a value in another column called ret
. Lastly, there is a column of dates containing month-end dates as integers.
My goal is to visualize all of the Quintile returns data at the same time, each as its own column, with only the date column acting like an index.
Essentially, I want to pivot on the Quintile
column and I have seen other places advising the use of IF
statements in MySQL as a way to achieve this.
For example, here is a query that would show one Quintile's worth of the data:
select yearmonth, ret
where Quintile=1
from quintile_returns
But I don't want to repeat this for all Quintile labels, save out the data individually, and piece it together in Python Pandas or Excel or something. I want to make SQL show it as distinct columns.
But when I try this IF
statement style poor man's pivot, this is the query I use:
select yearmonth,
IF(Quintile=1, ret, NULL) as Q1_ret,
IF(Quintile=2, ret, NULL) as Q2_ret
from quintile_returns
I basically get a diagonal of valid data back. All the rows where the Quintile is not 1 still show up, populated with NULL, and then so on for Quintile 2.
How do I avoid all of these extra NULL values? Basically, I want to tell SQL to return the column's value only if the condition is satisfied, and do not use NULL or anything else as a default else
-like placeholder.
Is there a way to do this that does not involve nested join-type statements?
Upvotes: 2
Views: 1438
Reputation: 60988
As you want to have only one row of output for multiple rows of input data, you have to aggregate your values. In this case you want to group them by yearmonth
. One possible (though not particularly portable) way would be the following:
SELECT yearmonth
, SUM((Quintile=1)*ret) AS Q1
, SUM((Quintile=2)*ret) AS Q2
FROM quintile_returns
GROUP BY yearmonth
This slightly hackish approach makes use of the fact that a comparison like Quintile=1
in MySQL yields an integer, 0
for false and 1
for true. So you take 1*ret=ret
for a matching Quintile, and 0*ret=0
for others. If you want things to be clearer and more portable, you could also write this as
SELECT yearmonth
, SUM(IF(Quintile=1, ret, 0)) AS Q1
, SUM(IF(Quintile=2, ret, 0)) AS Q2
FROM quintile_returns
GROUP BY yearmonth
Upvotes: 2
Reputation: 4018
You can use GROUP BY
to only show one row for each yearmonth
value, and then SUM() along with your IF() statements so that the ret
VALUES are only summed when the columns' IF() condition evaluates to TRUE:
SELECT `yearmonth`,
SUM(IF(`Quintile` = 1, ret, NULL)) as `Q1_ret`,
SUM(IF(`Quintile` = 2, ret, NULL)) as `Q2_ret`
FROM `quintile_returns`
GROUP BY `yearmonth`
Otherwise, you had the right idea with the IF() statements.
Upvotes: 1