Reputation: 403
I am creating a Power BI report using data from https://www.mohfw.gov.in/ website which provides latest corona virus data for all Indian states/union territories.
Data is in below format -
+-----+-----------------------------+-----------+-------+-------+
| SNo | State | Confirmed | Cured | Death |
+-----+-----------------------------+-----------+-------+-------+
| 1 | Andaman and Nicobar Islands | 14 | 11 | 0 |
| 2 | Andhra Pradesh | 603 | 42 | 15 |
| 3 | Arunachal Pradesh | 1 | 0 | 0 |
| 4 | Assam | 35 | 12 | 1 |
| 5 | Bihar | 86 | 37 | 2 |
They website is refreshed with new data everyday, so there is no date wise tracker. I wanted to track the day wise change(increment/decrements) in cases for every state, is there any way I can model it in power BI to achieve this?
For now what I am doing is I am downloading the table from the web page everyday and adding a date column which will be today's date(getdate()
) and loading the data into a SQL table. So everyday I am inserting a row for each of the state with that day's date-stamp in the table and then I can subtract it from previous day's data to see the changes, but I feel it is a inefficient way and the table size keep on increasing everyday.
So any suggestion to improve it, either by some changes in Power BI data model, or in SQL will be much appreciated.
Upvotes: 1
Views: 1117
Reputation: 3274
considering the data source is updated according to SCD 1 (Overwriting) the only way to track day wise change is to historize data every day. In practice, schedule a daily load of the data source and store the new data of that day.
You are implementing SCD 2 (Create a new record on change) in the correct way. It is important to make sure adding a technical field to each record with the timestamp when it was generated so you can study the trend later.
You could eventually optimize this approach by normalizing the model in order to reduce the size of the table you are applying SCD 2 (Create a new record on change).
Please let me give a simple example. Consider a table with:
If LAST_UPDATE changes 100,000 times a day, every days it triggers the creation of 100,000 new version of the same record (because we track its changes). Therefore, after one year the table would have still 1,000 fields and 36,500,000 records. Instead, if we normalize the model such that LAST_UPDATE field (historized with SCD 2) is stored in a separate table, after one year we would have one table with 1 record and 999 columns, and a different table with 1 column and 36,500,000 records.
In the case your database is a row database, you would much benefit from normalizing the model. Instead, if your database is columnar database, everything is already taken care of because each column is individually compressed instead of compressing row-wise.
Upvotes: 1