user1330974
user1330974

Reputation: 2616

Pivoting in Vertica

I have a table like below:

Geography        Period        Campaign        VariableName        VariableValue
11111            4/1/2017      ABC             TV_Imp              0.4
11111            4/2/2017      ABC             TV_Imp              0.5
11112            4/1/2017      ABC             TV_Imp              0.1
11113            4/2/2017      ABC             TV_Clicks           0.7
11113            4/2/2017      CDE             TV_Clicks           0.7
11113            4/3/2017      FGH             TV_Imp              0.5

I'd like to transform/pivot that table to:

Geography    Period    ABC_TV_Imp    ABC_TV_Clicks    CDE_TV_Clicks    FGH_TV_Imp
11111        4/1/2017  0.4           0                0                0
11111        4/2/2017  0.5           0                0                0
11112        4/1/2017  0.1           0                0                0
11113        4/2/2017  0             0.7              0                0
11113        4/2/2017  0             0                0.7              0
11113        4/3/2017  0             0                0                0.5

The problem here is that I do not know ahead of time what values under Campaign and VariableName columns are going to be. They could range from ~100-400 distinct values for each. Since Vertica also doesn't have PIVOT API nor is it possible to write dynamic SQL there, I'm wondering what is the best way to create a pivot table in Vertica.

Also, because the original/source table has about 180m rows, what would be the most efficient (computationally speaking) way to do the pivoting. I thought about splitting the original table by three or more (grouping by Geography, Period, Campaign, and VaribleName and splitting by Campaign and VariableName) and launching the individual pivotting process within each group before merging them back together. Effectively, I hope to 'parallalize' the pivoting process to speed up the time. Would that help?

Thank you in advanced for your suggestions/advice!

Upvotes: 1

Views: 8942

Answers (2)

surendra
surendra

Reputation: 9

create table pivotst ( master_country varchar(25) ); select max(decode(rn, 1, master_country, null)) || ', ' ||max(decode(rn, 2, master_country, null)) || ', ' ||max(decode(rn, 3, master_country, null)) || ', ' ||max(decode(rn, 4, master_country, null)) || ', ' ||max(decode(rn, 5, master_country, null)) master_country from (select master_country, row_number() over () rn from pivotst) foo; INSERT INTO pivotst values ('Hyderabad'); INSERT INTO pivotst values ('chennai');

select * from pivotst;

select max(decode(rn, 1, master_country, null)) || ', ' ||max(decode(rn, 2, master_country, null)) || ', ' ||max(decode(rn, 3, master_country, null)) || ', ' ||max(decode(rn, 4, master_country, null)) || ', ' ||max(decode(rn, 5, master_country, null)) master_country from (select master_country, row_number() over () rn from pivotst) foo;

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269633

This answer explains how to generate the SQL you need. After generating the SQL, you need to run it.

First do:

select distinct campaign, variable_name
from t;

This generates all your code. Put this in a spreadsheet and use formulas to create the line:

(case when campaign = $campaign and variable_name = $variable_name then variable_value else 0 end) as $campaign_$variable_name,

You can also do this in SQL as:

select distinct 
       replace(replace('(case when campaign = ''$campaign'' and variable_name = ''$variable_name'' then variable_value else 0 end) as $campaign_$variable_name,',
                        '$campaign', campaign
                      ), $variable_name, variable_name
              )
from t;

This is most of the SQL you need to write. Do the following:

  • Put select geography, period, before the first line
  • Remove the final comma
  • Add from t group by geography, period
  • Run the SQL statement

Upvotes: 2

Related Questions