wazza
wazza

Reputation: 800

Many Partitions in Hive

I have about 200,000 distinct values for the column "id" and I have used it as a partition key in one of the dynamically partitioned Hive table.

Now the partitions are created and when I try to query (I have used simple Select * query), it always returns following error:

FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out

Can anyone tell me why?

Upvotes: 2

Views: 2268

Answers (3)

pedram bashiri
pedram bashiri

Reputation: 1376

200,000 is too many dynamic partitions for hive. Try reducing the number of partitions.

Upvotes: 0

HakkiBuyukcengiz
HakkiBuyukcengiz

Reputation: 419

You could select * from dividing the id. For example;

select * from where id >0 and id < 50000
select * from where id >=50000 and id =< 100000
...

Because for each dynamic partition, hive allocates a memory portion. This style of querying would need less memory, however the whole process takes more time.

Upvotes: 0

Shivaprasad
Shivaprasad

Reputation: 167

Make use of Indexing feature of Hive(newly introduced) on column 'id'. Partitioning is not a good idea when too many partitions are getting creating, it's increases the load on the name node to track each of the partition created.

Upvotes: 2

Related Questions