Reputation: 658
I'm trying to use a bucket map join for queries in my star schema. I have some small dimensions tables and one big fact table. I would:
But if I try to bucket a table without any partitions I can see only 1 file in table folder after the Insert of the data (num_files: 1).
CREATE TABLE user_dimension (
id STRING,
...
name STRING)
CLUSTERED BY (id) INTO 24 BUCKETS;
>> OK
INSERT INTO TABLE user_dimension
SELECT id, name
FROM datasource;
>> Table user_dimension stats: [num_partitions: 0, num_files: 1, num_rows: 478, total_size: 36497, raw_data_size: 36019]
Upvotes: 2
Views: 7900
Reputation: 6193
Bucketing can also be done even without partitioning on Hive tables.
CREATE TABLE bucketed_table(
firstname VARCHAR(64),
lastname VARCHAR(64),
address STRING,
city VARCHAR(64),
state VARCHAR(64),
web STRING
)
CLUSTERED BY (state)
SORTED BY (city)
INTO 32 BUCKETS STORED AS SEQUENCEFILE;
Upvotes: 0
Reputation: 3047
When bucketing data, make sure you set
hive.enforce.bucketing=true;
before you insert data.
(If your dimension tables are really small, you can probably just use the regular map join without the need for bucketing. Bucketing is appropriate when you need to join 2 large tables.)
Upvotes: 3