Reputation: 800
I have about 200,000 distinct values for the column "id" and I have used it as a partition key in one of the dynamically partitioned Hive table.
Now the partitions are created and when I try to query (I have used simple
Select *
query), it always returns following error:
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
Can anyone tell me why?
Upvotes: 2
Views: 2268
Reputation: 1376
200,000 is too many dynamic partitions for hive. Try reducing the number of partitions.
Upvotes: 0
Reputation: 419
You could select * from dividing the id. For example;
select * from where id >0 and id < 50000
select * from where id >=50000 and id =< 100000
...
Because for each dynamic partition, hive allocates a memory portion. This style of querying would need less memory, however the whole process takes more time.
Upvotes: 0
Reputation: 167
Make use of Indexing feature of Hive(newly introduced) on column 'id'. Partitioning is not a good idea when too many partitions are getting creating, it's increases the load on the name node to track each of the partition created.
Upvotes: 2