Criterion for MapReduce jobs getting launched in Hive

Question

I am new bee to Hadoop, So please help me with this basic question.

When I do "select * from table where ;" in Hive, I understand it will launch mapreduce as it need to applying filtering on the underlying HDFS files.

But when I do select * from table without any where clause on Hive, sometimes mapreduce is getting launched and sometimes it isn't. My understanding is that Ideally it shouldn't launch mapreduce as there is no filtering condition.

So could someone explain me why in few cases mapreduce is getting launched on Hive?

Thanks in Advance.

franklinsijo · Accepted Answer

This is controlled by two Hive properties

hive.fetch.task.conversion
hive.fetch.task.conversion.threshold

A simple SELECT query would perform a fetch task instead of mapreduce task when hive.fetch.task.conversion is not set to none.

But if the total size of the files in the table exceeds the threshold value of bytes set in hive.fetch.task.conversion.threshold, then mapreduce task will be triggered.

The default value of hive.fetch.task.conversion.threshold is 1073741824 bytes (1GB) in Hive-0.14.0 and later.

Criterion for MapReduce jobs getting launched in Hive

Answers (1)

Related Questions