sam
sam

Reputation: 141

Why main table and temporary table giving different results?

I have one temporary external table and have put the data from HDFS into this table. Now I am inserting the same data into my partition main external table. The data gets inserted successfully, but when I am querying the main table using columns I'm getting different values for the columns.

I have loaded the data using csv file into my temporary that contains four fields.

col1=id
col2=visitDate
col3=comment
col4=age

Below are the queries and their results:

Temporary table:

create external table IF NOT EXISTS  dummy1(id string,visitDate string,comment string, age string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
;

MAIN Table:

create external table IF NOT EXISTS  dummy1(id string,comment string)
PARTITIONED BY (visitDate string, age string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS ORC
;


Result:

Temporary table:

select *from incr_dummy1;

1       11      a       20
2       12      b       3
1       13      c       34
4       14      d       23
5       15      e       45
6       16      f       65
7       17      g       78
8       18      h       9
9       19      i       12
10      20      j       34

select visitDate,age from incr_dummy1;

11      20
12      3
13      34
14      23
15      45
16      65
17      78
18      9
19      12
20      34


Main Table:

select *from dummy1;

1       11      a       20
2       12      b       3
1       13      c       34
4       14      d       23
5       15      e       45
6       16      f       65
7       17      g       78
8       18      h       9
9       19      i       12
10      20      j       34

select visitDate,age from dummy1;

a       20
b       3
c       34
d       23
e       45
f       65
g       78
h       9
i       12
j       34

so in above main external table,the value of "comment" column is coming when I'm querying for "visitDate" column.

Please let me know what mistake I'm doing here?

Upvotes: 0

Views: 35

Answers (1)

notNull
notNull

Reputation: 31490

As i can see column orders are not same in temporary and final tables.

While inserting data from Temporary table to final table check you are having the correct order of columns in select statement(partition cols needs to be at the end of select cols).

hive> insert into dummy1 partition(visitDate,age) select id,comment,visitDate,age from incr_dummy1;

Just in case if you are still having issues then its better to check

  • As you are having external partitioned table (when we drop the table data will not be dropped on HDFS), check the hdfs directory is there any extra files that are not been deleted.

  • Then drop the table, delete the hdfs directory and create the table then run your job again.


Update:

Option1:

is it possible to match columns order in temporary table with final table, if yes then change the order of columns.

Option2:

use subquery with quoted identifier to exclude the original columns and get only the alias columns into our final select query.

hive> set hive.support.quoted.identifiers=none;
hive> insert into dummy1 partition(visitDate,age)
      select `(visitDate|age)?+.+` from --exlude visitDate,age columns. 
     (select *,visitDate vis_dat,age age_n from incr_dummy1)t;

Upvotes: 1

Related Questions