Reputation: 1040
I have a table like this:
id | date | balance |
---|---|---|
1 | 01.01 | 5 |
1 | 01.02 | 15 |
1 | 01.03 | 20 |
1 | 01.04 | 20 |
1 | 01.05 | 40 |
How can i calculate GINI coef on the last 4th values?
like this:
this code calculate GINI by id
with cte_1 as (
select *, ROW_NUMBER() over(partition by id order by balance) as rank
from #t6
)
SELECT id,
1 - 2 * sum((cast(balance as float) * (rank - 1) + balance / 2)) / count(*) / sum(balance) AS gini
FROM cte_1
GROUP BY id
ORDER BY id ASC
Upvotes: 0
Views: 74
Reputation: 10021
I didn't see any other responses to this. I expect most (including myself) are unfamiliar with the GINI calculation, and the sparse information in the question did not give them much to work on. In particular, not showing the original Excel formula was very limiting.
I took a dive into the mathematics, including looking at this writeup but was unable to reproduce your numbers with independent calculations. I was also unable to reproduce your results with your posted SQL. (See this fiddle - none of the results are as expected.)
However, since your question appears mainly about retrieving the last 4 values to feed each calculation, I can still help to a limited extent.
If you first assign sequence numbers to the input data, then for each sequence number 4 or greater, you can self join the data to itself selecting the 4 required rows. From there, you can sort (rank) the data and apply your GINI calculation.
Something like:
WITH CTE_Numbered AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY Id ORDER BY Date) AS RowNum
FROM Data
),
CTE_Selector AS (
SELECT
N1.Id, N1.RowNum AS CalcNum,
N2.Rownum, N2.Date, N2.Balance,
ROW_NUMBER() OVER(PARTITION BY N1.Id, N1.RowNum ORDER BY N2.Balance) AS Rank
FROM CTE_Numbered N1
JOIN CTE_Numbered N2
ON N2.Id = N1.Id
AND N2.RowNum BETWEEN (N1.RowNum - 3) AND N1.RowNum
WHERE N1.RowNum >= 4
)
SELECT
Id, CalcNum,
MIN(RowNum) AS FromRow,
MAX(RowNum) AS ThruRow,
MIN(Date) AS FromDate,
MAX(Date) AS ThruDate,
--STRING_AGG(Balance, ', ') WITHIN GROUP(ORDER BY RowNum) AS SelectedValues,
STRING_AGG(Balance, ', ') WITHIN GROUP(ORDER BY Rank) AS OrderedValues,
--STRING_AGG(CONCAT(Rank,'[',RowNum,']=',Balance), ', ') WITHIN GROUP(ORDER BY Rank) AS OrderedValues,
1 - 2 * sum((cast(Balance as float) * (Rank - 1) + Balance / 2)) / count(*) / sum(Balance) AS gini
FROM CTE_Selector S
GROUP BY Id, CalcNum
ORDER BY Id, CalcNum;
Results (with some extra data):
Id | CalcNum | FromRow | ThruRow | FromDate | ThruDate | OrderedValues | gini |
---|---|---|---|---|---|---|---|
1 | 4 | 1 | 4 | 2023-01-01 | 2023-04-01 | 5, 15, 20, 20 | -0.208333333333333 |
1 | 5 | 2 | 5 | 2023-02-01 | 2023-05-01 | 15, 20, 20, 40 | -0.197368421052632 |
1 | 6 | 3 | 6 | 2023-03-01 | 2023-06-01 | 20, 20, 35, 40 | -0.16304347826087 |
1 | 7 | 4 | 7 | 2023-04-01 | 2023-07-01 | 20, 25, 35, 40 | -0.145833333333333 |
1 | 8 | 5 | 8 | 2023-05-01 | 2023-08-01 | 25, 35, 40, 55 | -0.153225806451613 |
2 | 4 | 1 | 4 | 2023-01-01 | 2023-04-01 | 50, 150, 200, 200 | -0.208333333333333 |
2 | 5 | 2 | 5 | 2023-02-01 | 2023-05-01 | 150, 200, 200, 400 | -0.197368421052632 |
See this db<>fiddle for a demo.
The final calculation does not match your expected result, but at least this should give you the 4 ordered input values that you need for each calculation. I'll leave it to you to fix the final calculation. (If you add your Excel formula to your original post, I may be able to help further.)
Upvotes: 1