Reputation: 618
I am trying to compute the industry concentration of a portfolio using the Herfindahl-Index in SQL. My data contains the stock holdings at a certain point in time of a fund where each stock is linked to the industry depending on the issuing firm's business model. I want to implement the formula \sum_{i=1}^N s_i^2
, see Wikipedia.
I set up a minimal example with toy data on SQL Fiddle.
SELECT
fdate, fund, SUM(ind_share * ind_share) as hfi
FROM (
SELECT
a.fdate, a.fund, a.industry, SUM(amount)/b.fund_size AS ind_share
FROM holdings a
JOIN (
SELECT
fdate, fund, SUM(amount) AS fund_size
FROM
holdings
GROUP BY
fdate, fund) b
ON
a.fdate = b.fdate AND a.fund = b.fund
GROUP BY fdate, fund, industry) AS T
GROUP BY fdate, fund
This solution is ugly because it requires many sub-queries. For an easier solution, I would like to sequentially perform Group By
or assign a group sum to every group member.
Obviously, my dataset is much larger so I would like to optimize this code. Is there a more elegant solution without using sub-queries?
Thank you.
Upvotes: 0
Views: 82
Reputation: 15941
This is about as simple/succinct as it can get:
SELECT f.fdate, f.fund
, SUM(POW(fundind_size/fund_size, 2)) as hfi
FROM (
SELECT fdate, fund, SUM(amount) AS fund_size
FROM holdings
GROUP BY fdate, fund
) AS f
JOIN
(
SELECT fdate, fund, SUM(amount) fundind_size
FROM holdings
GROUP BY fdate, fund, industry
) AS fi
ON f.fdate = fi.fdate AND f.fund = fi.fund
GROUP BY f.fdate, f.fund
;
It's not much different than what you currently have, there are still three separate GROUPings, and two subqueries; but the subqueries are not quite as "deep" in this version.
Note: The latest versions of MySQL have added "window" functions, and "windowing" for aggregate functions. I haven't had much need for them, but it is possible they could help in this scenario.
Upvotes: 1