Reputation: 77
Main Table:
CREATE TABLE ca_ger.cln_trans_base (
store_code VARCHAR(25),
year_week varchar(25),
division VARCHAR(25) encode lzo,
market_share_code VARCHAR(25) encode bytedict,
pon VARCHAR(25) encode lzo,
osp_price REAL encode bytedict
)
diststyle key distkey(store_code)
sortkey(store_code,year_week);
COMMIT;
Second table:
CREATE TABLE ca_ger.divisions(
store_code VARCHAR(25),
year_week varchar(25),
division VARCHAR(25) encode lzo
)
diststyle key distkey(store_code)
sortkey(store_code,year_week);
COMMIT;
alter table ca_ger.cln_Trans_base
add column division(varchar 25);
Join clause:
update ca_ger.cln_trans_base
set division=b.division
from ca_Ger.cln_trans_base a
join divisions b on a.year_week=b.year_week;
After this, I ran vacuum
and analyze
. After this, I checked stats table and rows are doubled.
Upvotes: 0
Views: 432
Reputation: 5719
In Amazon Redshift,
Update=DELETE+INSERT
Hence your rows size is doubled. As you might be updating all the rows.
After every Update or on scheduled basis, you should be doing full Vaccum
as suggested in one of comment.
VACUUM FULL cln_trans_base
It will reduce the size to half. Hope it will help you.
Upvotes: 0
Reputation: 269141
Extending @a_horse's suggestion, the UPDATE statement should not repeat the table being updated. Instead, use:
UPDATE ca_ger.cln_trans_base a
SET division = b.division
FROM divisions b
WHERE a.year_week = b.year_week;
See: PostgreSQL UPDATE Join with A Practical Example
Upvotes: 1