user2531569
user2531569

Reputation: 619

Criterion for MapReduce jobs getting launched in Hive

I am new bee to Hadoop, So please help me with this basic question.

When I do "select * from table where <condition>;" in Hive, I understand it will launch mapreduce as it need to applying filtering on the underlying HDFS files.

But when I do select * from table without any where clause on Hive, sometimes mapreduce is getting launched and sometimes it isn't. My understanding is that Ideally it shouldn't launch mapreduce as there is no filtering condition.

So could someone explain me why in few cases mapreduce is getting launched on Hive?

Thanks in Advance.

Upvotes: 2

Views: 284

Answers (1)

franklinsijo
franklinsijo

Reputation: 18270

This is controlled by two Hive properties

  • hive.fetch.task.conversion
  • hive.fetch.task.conversion.threshold

A simple SELECT query would perform a fetch task instead of mapreduce task when hive.fetch.task.conversion is not set to none.

But if the total size of the files in the table exceeds the threshold value of bytes set in hive.fetch.task.conversion.threshold, then mapreduce task will be triggered.

The default value of hive.fetch.task.conversion.threshold is 1073741824 bytes (1GB) in Hive-0.14.0 and later.

Upvotes: 1

Related Questions