Why, when changing the double format to int64 in power bi, the memory occupied by a column decreases, but the total weight of the file increases?

I have a report in which one of the columns contains many unique integers. I saw that the data format is decimal (double) enter image description here, I decided to convert this column to int64 (integer) format. In dax studio, VertiPaq actually showed a decrease in the memory occupied by the column enter image description here, but when saving the report in the desktop version, the weight of the report itself as a file increased. How can this be? enter image description here


Upvotes: 1

Views: 186

Answers (1)

davidebacci
davidebacci

Reputation: 30289

Compression is based on a number of factors. The number of distinct values and the sort order of columns plays a big part. VertiPaq will also use some heuristics to calculate the optimum compression strategy for the data and this can change based on many different variables.

In your case, hash encoding has been used which requires a dictionary. This will also probably use run length encoding which means the sort order of the columns can make the memory footprint fluctuate.

Data compression in VertiPaq is quite involved. I can recommend Chapter 17 in the Definitive Guide to DAX if you want all the details. I quote the following excerpt.

The factors influencing the compression ratio of a Tabular model are, in order of importance:

  1. The cardinality of the column, which defines the number of bits used to store a value.
  2. The number of repetitions, that is, the distribution of data in a column. A column with many repeated values is compressed more than a column with very frequently changing values.
  3. The number of rows in the table.
  4. The data type of the column, which only affects the dictionary size. Given all these considerations, it is nearly impossible to predict the compression ratio of a table

Upvotes: 1

Related Questions