Reputation: 65
I have a curiosity. I want to know the difference of hive partitioning and multiple tables. I've known that the purpose of partitioning is for reducing the query object. I thing this is same between partitioning and multiple tables. What's difference?
Upvotes: 0
Views: 902
Reputation: 1270371
Partitions and "parallel" tables are quite similar. In fact, SQL Server has an implementation of partitions using "partitioned views", which combine multiple tables into a single view. The difference between a partitioned table and multiple tables combined into one is both subtle and obvious, but boils down to this: a table is a table is a table.
What impact does this have? First, a user does not need to be aware of partitions in order to use a table. For data stored in multiple tables, a user needs to be aware of the table name, because SQL requires the name in the FROM
clause.
Second, the database engine knows about partitions, but not necessarily how multiple tables are combined. A query that uses partitions chooses them in the WHERE
clause rather than the FROM
clause. The burden of understanding the data structure shifts from the user to the compiler.
There are other differences. Partitions generally have the same columns and types. Partitions generally have the same index and clustering properties. Partitions can be added dynamically, based on the contents of the data. Partitions can be added and removed without affecting existing queries.
Upvotes: 2