Partitioning in Hive table

Question

From a blog, in a Hive's PARTITIONING FUNCTIONS topic, they had created a table in the below schema.

    CREATE TABLE sales (
    sales_order_id  BIGINT,
    order_amount    FLOAT,
    order_date      STRING,
    due_date        STRING,
    customer_id     BIGINT
)
PARTITIONED BY (country STRING, year INT, month INT, day INT) ;

and inserting partitioned data into the sales table as below.

INSERT INTO sales
PARTITION (country = 'US', year = 2012, month = 12, day = 22)
SELECT  sales_order_id,
        order_amount,
        due_date,
        customer_id,
        cntry,
        yr,
        mo,
        d
FROM    source_view
WHERE   cntry = 'US'
        AND yr = 2012 
        AND mo = 12
        AND d = 22 ;

My doubt araise here. Since there is no field like cntry or yr or mo or d in the create table schema, how in the insert query they have mentioned cntry or yr or mo or d. How it is possible?. does the author made any mistakes in the query or is there any option available like alias.

Santiago Cepas · Accepted Answer

Although the partitioning fields are not explicitly created as columns, they are (pseudo)columns and can be queried.

From Hive's docs:

the partition you create makes a pseudocolumn on which you can query

Also (from the Insert documentation):

Inserts can be done to a table or a partition. If the table is partitioned, then one must specify a specific partition of the table by specifying values for all of the partitioning columns.

Partitioning in Hive table

Answers (2)

Related Questions