이대희
이대희

Reputation: 65

Hive partitioning vs multiple tables, What is difference?

I have a curiosity. I want to know the difference of hive partitioning and multiple tables. I've known that the purpose of partitioning is for reducing the query object. I thing this is same between partitioning and multiple tables. What's difference?

Upvotes: 0

Views: 902

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270371

Partitions and "parallel" tables are quite similar. In fact, SQL Server has an implementation of partitions using "partitioned views", which combine multiple tables into a single view. The difference between a partitioned table and multiple tables combined into one is both subtle and obvious, but boils down to this: a table is a table is a table.

What impact does this have? First, a user does not need to be aware of partitions in order to use a table. For data stored in multiple tables, a user needs to be aware of the table name, because SQL requires the name in the FROM clause.

Second, the database engine knows about partitions, but not necessarily how multiple tables are combined. A query that uses partitions chooses them in the WHERE clause rather than the FROM clause. The burden of understanding the data structure shifts from the user to the compiler.

There are other differences. Partitions generally have the same columns and types. Partitions generally have the same index and clustering properties. Partitions can be added dynamically, based on the contents of the data. Partitions can be added and removed without affecting existing queries.

Upvotes: 2

Related Questions