TPC-DS BenchMark on Hadoop - Why use star schema

Question

I am trying to run tpc-ds benchmark with sparksql.

In the document they talk about having star schema and number of tables.

From what my understanding of hadoop is , its better to have denormalized data, and then you can you format like paraquet which are good in compression. (use partitions for parallelism)

I also found this document from SAS -> https://support.sas.com/resources/papers/data-modeling-hadoop.pdf

which also talks in the same term. I am no dataware house expert, so I will request , to help me understand how to model data for dataware house in hadoop

TPC-DS BenchMark on Hadoop - Why use star schema

Answers (0)

Related Questions