Full table scan issue with LEFT OUTER JOIN in Hive

Question

I'm trying to a LEFT OUTER JOIN operation on 2 of my tables in hive. Could understand that we have include filter conditions along with the join conditions in case of joins, imitting them from where conditions to avoid full table scans. Reference: https://gist.github.com/randyzwitch/9abeb66d8637d1a0007c

Inspite of doing this, my query is resulting in a high number of mappers and reducers as if it is doing a full table scan.

Here is my query and explain plan. I'm not good at understanding this explain plan. m.date_id and d.REC_CREATED_DATE are the partitioned columns in the respective tables so it should actually scan only these partitions.

Any suggestions to improve my query would be of great help.

hive> EXPLAIN SELECT m.execution_id
> ,m.operation_name
> ,m.return_code
> ,m.explanation
> ,d.REC_CREATED_DATE
> FROM web_log_master m  LEFT OUTER JOIN web_log_detail d
> on (m.execution_id = d.execution_id AND m.date_id='2015-07-14' and d.REC_CREATED_DATE='2015-07-14') ;
OK
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_LEFTOUTERJOIN (TOK_TABREF (TOK_TABNAME web_log_master) m) (TOK_TABREF (TOK_TABNAME web_log_detail) d) (and (AND (= (. (TOK_TABLE_OR_COL m) execution_id) (. (TOK_TABLE_OR_COL d) execution_id)) (= (. (TOK_TABLE_OR_COL m) date_id) '2015-07-14')) (= (. (TOK_TABLE_OR_COL d) REC_CREATED_DATE) '2015-07-14')))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL m) execution_id)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL m) operation_name)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL m) return_code)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL m) explanation)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL d) REC_CREATED_DATE)))))

STAGE DEPENDENCIES:
  Stage-4 is a root stage , consists of Stage-1
  Stage-1
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-4
  Conditional Operator

  Stage: Stage-1
Map Reduce
  Alias -> Map Operator Tree:
    d
      TableScan
        alias: d
        Reduce Output Operator
          key expressions:
                expr: execution_id
                type: string
          sort order: +
          Map-reduce partition columns:
                expr: execution_id
                type: string
          tag: 1
          value expressions:
                expr: rec_created_date
                type: string
    m
      TableScan
        alias: m
        Reduce Output Operator
          key expressions:
                expr: execution_id
                type: string
          sort order: +
          Map-reduce partition columns:
                expr: execution_id
                type: string
          tag: 0
          value expressions:
                expr: execution_id
                type: string
                expr: operation_name
                type: string
                expr: return_code
                type: string
                expr: explanation
                type: string
                expr: date_id
                type: string
  Reduce Operator Tree:
    Join Operator
      condition map:
           Left Outer Join0 to 1
      condition expressions:
        0 {VALUE._col0} {VALUE._col1} {VALUE._col2} {VALUE._col3}
        1 {VALUE._col3}
      filter predicates:
        0 {(VALUE._col13 = '2015-07-14')}
        1
      handleSkewJoin: false
      outputColumnNames: _col0, _col1, _col2, _col3, _col19
      Select Operator
        expressions:
              expr: _col0
              type: string
              expr: _col1
              type: string
              expr: _col2
              type: string
              expr: _col3
              type: string
              expr: _col19
              type: string
        outputColumnNames: _col0, _col1, _col2, _col3, _col4
        File Output Operator
          compressed: false
          GlobalTableId: 0
          table:
              input format: org.apache.hadoop.mapred.TextInputFormat
              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

  Stage: Stage-0
    Fetch Operator
      limit: -1


Time taken: 13.616 seconds, Fetched: 90 row(s)

Full table scan issue with LEFT OUTER JOIN in Hive

Answers (1)

Related Questions