Reputation: 658

Hive Bucketing without Partitions?

I'm trying to use a bucket map join for queries in my star schema. I have some small dimensions tables and one big fact table. I would:

bucket the fact table by its FK keys
bucket dimensions by their ID keys

But if I try to bucket a table without any partitions I can see only 1 file in table folder after the Insert of the data (num_files: 1).

CREATE TABLE user_dimension (
id STRING,
...
name STRING)
CLUSTERED BY (id) INTO 24 BUCKETS;

>> OK

INSERT INTO TABLE user_dimension
SELECT id, name
FROM datasource;

>> Table user_dimension stats: [num_partitions: 0, num_files: 1, num_rows: 478, total_size: 36497, raw_data_size: 36019]

Upvotes: 2

Answers (2)

Viraj Wadate

Reputation: 6193

Bucketing can also be done even without partitioning on Hive tables.

CREATE TABLE bucketed_table(
    firstname VARCHAR(64),
    lastname VARCHAR(64),
    address STRING,
    city VARCHAR(64),
    state VARCHAR(64),
    web STRING
) 
CLUSTERED BY (state) 
SORTED BY (city) 
INTO 32 BUCKETS STORED AS SEQUENCEFILE;

Upvotes: 0

Carter Shanklin

Reputation: 3047

When bucketing data, make sure you set

hive.enforce.bucketing=true;

before you insert data.

(If your dimension tables are really small, you can probably just use the regular map join without the need for bucketing. Bucketing is appropriate when you need to join 2 large tables.)

Upvotes: 3

Hive Bucketing without Partitions?

Answers (2)

Related Questions