Reputation: 752
So I'm working on a Power BI file, I want to add more data to it, and I can't understand how the file size is bigger than the sum of the datas...
Can anyone explain how this works? Is this like doing an SQL join on two tables, making the result way bigger? I thought only the datas themselves would be saved in the file, and then PBI would join them in memory. But then maybe that would require too much memory
Upvotes: 1
Views: 92
Reputation: 30289
Power BI uses the vertipaq engine to store your data. This is stored in a columnar database and cardinality is everything (the number of distinct values in a column). It uses value encoding, hash encoding (with a dictionary) and run length encoding to compress the data to a very high level.
Something like a datetime column will compress horribly where as if you split this into date and time columns, will then compress very well. It really depends on the cardinality of your data.
What is the nature of the data you are trying to add. For instance, if you have lots of date columns, Power BI will secretly add a hidden date table for each. If one of the dates in that column is not parsed correctly or your dates are far apart, you get a date table with a value for every date between the min and max which could be quite a lot of additional data. You can disable this functionality in the options:
Upvotes: 1