Reputation: 2616
I have a table like below:
Geography Period Campaign VariableName VariableValue
11111 4/1/2017 ABC TV_Imp 0.4
11111 4/2/2017 ABC TV_Imp 0.5
11112 4/1/2017 ABC TV_Imp 0.1
11113 4/2/2017 ABC TV_Clicks 0.7
11113 4/2/2017 CDE TV_Clicks 0.7
11113 4/3/2017 FGH TV_Imp 0.5
I'd like to transform/pivot that table to:
Geography Period ABC_TV_Imp ABC_TV_Clicks CDE_TV_Clicks FGH_TV_Imp
11111 4/1/2017 0.4 0 0 0
11111 4/2/2017 0.5 0 0 0
11112 4/1/2017 0.1 0 0 0
11113 4/2/2017 0 0.7 0 0
11113 4/2/2017 0 0 0.7 0
11113 4/3/2017 0 0 0 0.5
The problem here is that I do not know ahead of time what values under Campaign
and VariableName
columns are going to be. They could range from ~100-400 distinct values for each. Since Vertica also doesn't have PIVOT
API nor is it possible to write dynamic SQL there, I'm wondering what is the best way to create a pivot table in Vertica.
Also, because the original/source table has about 180m rows, what would be the most efficient (computationally speaking) way to do the pivoting. I thought about splitting the original table by three or more (grouping by Geography
, Period
, Campaign
, and VaribleName
and splitting by Campaign
and VariableName
) and launching the individual pivotting process within each group before merging them back together. Effectively, I hope to 'parallalize' the pivoting process to speed up the time. Would that help?
Thank you in advanced for your suggestions/advice!
Upvotes: 1
Views: 8942
Reputation: 9
create table pivotst
(
master_country varchar(25)
);
select max(decode(rn, 1, master_country, null)) || ', ' ||max(decode(rn, 2, master_country, null)) || ', ' ||max(decode(rn, 3, master_country, null)) || ', ' ||max(decode(rn, 4, master_country, null)) || ', ' ||max(decode(rn, 5, master_country, null)) master_country from (select master_country, row_number() over () rn from pivotst) foo;
INSERT INTO pivotst values ('Hyderabad');
INSERT INTO pivotst values ('chennai');
select * from pivotst;
select max(decode(rn, 1, master_country, null)) || ', ' ||max(decode(rn, 2, master_country, null)) || ', ' ||max(decode(rn, 3, master_country, null)) || ', ' ||max(decode(rn, 4, master_country, null)) || ', ' ||max(decode(rn, 5, master_country, null)) master_country from (select master_country, row_number() over () rn from pivotst) foo;
Upvotes: 0
Reputation: 1269633
This answer explains how to generate the SQL you need. After generating the SQL, you need to run it.
First do:
select distinct campaign, variable_name
from t;
This generates all your code. Put this in a spreadsheet and use formulas to create the line:
(case when campaign = $campaign and variable_name = $variable_name then variable_value else 0 end) as $campaign_$variable_name,
You can also do this in SQL as:
select distinct
replace(replace('(case when campaign = ''$campaign'' and variable_name = ''$variable_name'' then variable_value else 0 end) as $campaign_$variable_name,',
'$campaign', campaign
), $variable_name, variable_name
)
from t;
This is most of the SQL you need to write. Do the following:
select geography, period,
before the first linefrom t group by geography, period
Upvotes: 2