Jeffrey Bane
Jeffrey Bane

Reputation: 592

Trying to accomplish without dynamic SQL (sql server)

All,

I'm trying to pull off an insert from one table to another without using dynamic sql. However, the only solutions I'm coming up with at the moment use dynamic sql. It's been tricky to search for any similar scenarios.

Here are the details:

My starting point is the following legacy table:

CREATE TABLE [dbo].[_Combinations](
[AttributeID] [int] NULL,
[Value] [varchar](50) NULL
) ON [PRIMARY]
GO
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (16, N'1')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (16, N'2')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'Red')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'Orange')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'Yellow')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'Green')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'Blue')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'Indigo')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'Violet')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'A')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'B')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'C')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'D')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'E')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'F')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'G')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'H')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'I')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'J')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'K')

SELECT * FROM _Combinations

The _Combinations table contains a key for different types of attributes (AttributeID) and the possible values for each attribute (Value).

In this case, there are 3 different attributes with multiple possible values, however there can be many more (up to 10).

The requirement is then to create every possible combination of each value and store it normalized, as there will be other data stored with each possible combination. I need to store both the attribute keys and values that make up each combination, so it's not just a simple cross join to display each combination. The target table for storing each combination of attributes is this:

CREATE TABLE [dbo].[_CombinedAttributes](
[GroupKey] [int] NULL,
[AttributeID] [int] NULL,
[Value] [varchar](50) NULL
) ON [PRIMARY]

So attribute combination records using the above data would look like this in the target table:

GroupKey    AttributeID Value
1               8         A
1               16        1
1               28        Red
2               8         B
2               16        1
2               28        Red

This gives me what I need. Each group has an identifier and I can track the attributeIDs and values that make up each group. I'm using two scripts to get from the _Combinations table to the format of the _CombinedAttributes table:

-- SCRIPT #1
SELECT Identity(int) AS RowNumber, * INTO #Test
FROM (
SELECT AttributeID AS Attribute1, Value AS Value1 FROM _Combinations WHERE AttributeID = 8) C1
CROSS JOIN 
(
SELECT AttributeID AS Attribute2, Value AS Value2 FROM _Combinations WHERE AttributeID = 16) C2
CROSS JOIN
(
SELECT AttributeID AS Attribute3, Value AS Value3 FROM _Combinations WHERE AttributeID = 28) C3

-- SCRIPT #2

INSERT INTO _CombinedAttributes
SELECT RowNumber AS GroupKey, Attribute1, Value1 
FROM #Test
UNION ALL
SELECT RowNumber, Attribute2, Value2 
FROM #Test
UNION ALL
SELECT RowNumber, Attribute3, Value3
FROM #Test
ORDER BY RowNumber, Attribute1

The above two scripts work, but obviously there's some drawbacks. Namely I need to know how many attributes I'm dealing with and there's hard coding of IDs, so I can't generate this on the fly. The solution I came up with is I build the strings for Script 1 and Script 2 by looping through the attributes in the the _Combinations table and generate execution strings which is long and messy but I can post if needed. Can anyone see a way to pull off the format for the final insert without dynamic sql?

This routine wouldn't be run very much, but it's going to be run enough that I'd like to not be doing any execute string building and use straight SQL.

Thanks in advance.

UPDATE:

When I use a second dataset, Gordon's code is no longer returning correct results, it's creating groups with only 1 attribute near the end, however on this second dataset I get the correct rowcount with Nathan's routine (row count on final result should be 396). But as I stated on the comments, if I use the first dataset, I get the opposite result, Gordon's returns correctly, but Nathan's code has dups. I'm at a loss. Here is the second data set:

DROP TABLE [dbo].[_Combinations] GO

CREATE TABLE [dbo].[_Combinations]( [AttributeID] [int] NULL, [Value] varchar NULL ) ON [PRIMARY] GO

INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (16, N'1')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (16, N'2')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'<=39')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'40-44')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'45-49')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'50-54')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'55-64')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'65+')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'AA')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'JJ')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'CC')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'DD')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'EE')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'KK')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'BB')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'FF')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'GG')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'HH')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'II')

Upvotes: 4

Views: 374

Answers (5)

Edward
Edward

Reputation: 1

Regarding the issue with exp(sum(log(count(*))) over ()), the answer for me seemed to be to introduce the ROUND function to the mix. Thus, the following snippet seems to produce a reliable answer (so far at least):

ROUND(exp(sum(log(count(*))) over ()), 0)

Upvotes: 0

FrankPl
FrankPl

Reputation: 13315

Recursive Solution

The following is a recursive solution, SQLFiddle is here:

with a as ( -- unique AttributeIDs
  select AttributeID
        ,Row_Number() over(order by AttributeID) as rowNo
        ,count(*) as cnt
    from [dbo].[_Combinations]
  group by AttributeID
),
r as (
  -- start recursion: list all values of the first attribute
  select Dense_Rank() over(order by c.[Value]) - 1 as GroupKey
        ,c.AttributeID
        ,c.[Value]
        ,a.cnt as factor
        ,1 as level
    from a
         join [dbo].[_Combinations] as c on a.AttributeID = c.AttributeID
   where a.rowNo = 1

  union all

  -- recursion step: add the combinations with the values of the next attribute
  select GroupKey
        ,case when AttributeID = 'prev' then prevAttribID else currAttribID end as AttributeID
        ,[Value]
        ,factor
        ,level
    from (select r.Value as prev
                ,c.Value as curr
                ,(Dense_Rank() over(order by c.[Value]) - 1) * r.factor + r.GroupKey as GroupKey
                ,r.level + 1 as level
                ,r.factor * a.cnt as factor
                ,r.AttributeID as prevAttribID
                ,a.AttributeID as currAttribID
            from r
                 join a on r.level + 1 = a.rowNo
                 join [dbo].[_Combinations] as c on a.AttributeID = c.AttributeID
         ) as p
         unpivot ( Value for AttributeID in (prev, curr)) as up
)
-- get result: this is the data from the deepest level
select distinct
       GroupKey + 1 as GroupKey -- start with one instead of zero
      ,AttributeID
      ,[Value]
  from r
 where level = (select count(*) from a)
order by GroupKey, AttributeID, [Value]

Dynamic Solution

And this is a slightly shorter version using a dynamic statement:

declare @stmt varchar(max);
with a as ( -- unique attribute keys, cast here to avoid casting when building the dynamic statement
  select distinct cast(AttributeID as varchar(10)) as ID
    from [dbo].[_Combinations]
)
select @stmt = 'select GroupKey, Cast(SubString(AttributeIDStr, 2, 100) as int) as AttributeID, Value
  from
  (
  select '
  + (select ' C' + ID + '.Value as V' + ID + ', ' from a for xml path(''))
  + ' Row_Number() over(order by '
  + stuff((select ', C' + ID + '.Value' from a for xml path('')), 1, 2, '')
  + ') AS GroupKey from '
  + stuff((select ' cross join [dbo].[_Combinations] as C' + ID from a for xml path('')), 1, 11, '')
  + ' where ' 
  + stuff((select ' and C' + ID + '.AttributeID = ' + ID from a for xml path('')), 1, 4, '')
  + ')  as p unpivot (Value for AttributeIDStr in ('
  + stuff((select ', V' + ID from a for xml path('')), 1, 2, '')
  + ')) as up'
;
exec (@stmt)

As SQL Server does not have the nice list aggregate function that other databases have, one must use the ugly stuff((select ... for xml path(''))) expression.

The statement produced for the sample data is - apart from whitespace differences - the following:

select GroupKey, Cast(SubString(AttributeIDStr, 2, 100) as int) as AttributeID, Value
  from
  (
  select C16.Value as V16
        ,C28.Value as V28
        ,C8.Value  as V8
        ,Row_Number() over(order by C16.Value, C28.Value, C8.Value) AS GroupKey
    from [dbo].[_Combinations] as C16
         cross join
         [dbo].[_Combinations] as C28
         cross join
         [dbo].[_Combinations] as C8
   where C16.AttributeID = 16
     and C28.AttributeID = 28
     and C8.AttributeID = 8
  )  as p
  unpivot ( Value for AttributeIDStr in (V16, V28, V8)) as up

Both solutions avoid the multiplication aggregation workaround using exp(log()) that is used in some other answers, which is very sensitive to rounding errors.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269443

I think this solves your problem.

Here is the approach. First, observe that the final data has the product of the number of each attribute -- 2*7*11 = 154 rows. Then observe that each value occurs a fixed number of times. For AttributeId = 16, each value occurs 154 / 2, because there are two values.

So, the idea is to calculate the number of times that each value appears. Then, generate the list of all the values. The final challenge is to assign the group numbers to these. For this, I use row_number() partitioned by the attribute id. To be honest, I'm not 100% that the grouping assignment is correct (it makes sense and it passed the eyeball test), but I'm worried that I'm missing a subtlety.

Here is the query:

with attributecount1 as (
      select c.AttributeId, count(*) as cnt
      from _Combinations c
      group by c.AttributeId
     ),
     const as (
      select exp(sum(log(cnt))) as tot, count(*) as numattr
      from attributecount1
     ),
     attributecount as (
       select a.*,
              (tot / a.cnt) as numtimes
       from attributecount1 a cross join const
     ),
     thevalues as (
      select c.AttributeId, c.Value, ac.numtimes, 1 as seqnum
      from AttributeCount ac join
           _Combinations c
           on ac.AttributeId = c.AttributeId
      union all
      select v.AttributeId, v.Value, v.numtimes, v.seqnum + 1
      from thevalues v
      where v.seqnum + 1 <= v.numtimes
     )
select row_number() over (partition by AttributeId order by seqnum, Value) as groupnum,
      *
from thevalues
order by 1, 2

The SQL Fiddle is here.

EDIT:

Unfortunately, I don't have access to SQL Server today and SQL Fiddle is acting up.

The problem is solvable. The above solution works, but -- as stated in my comment -- only when the dimensions are pairwise mutually prime. The problem is the assignment of the group number to the values. It turns out that this is a problem in number theory.

Essentially, we want to enumerate the combinations. If there were 2 in two groups, then it would be:

group 0:  1    1
group 1:  1    2
group 2:  2    1
group 3:  2    2

You can see a relationship between the group number and which values are assigned -- based on the binary representation of the group number. If this were 2x3, then it would look like:

group 0:  1    1
group 1:  1    2
group 2:  1    3
group 3:  2    1
group 4:  2    2
group 5:  2    3

Same idea, but now there is not "binary" representation. Each position in the number would have a different base. No problem.

So, the challenge is mapping a number (such as the group number) to each digit. This requires appropriate division and modulo arithmetic.

The following implements this in Postgres:

with c as (
      select 1 as attrid, '1' as val union all
      select 1 as attrid, '2' as val union all
      select 2 as attrid, 'A' as val union all
      select 2 as attrid, 'B' as val union all
      select 3 as attrid, '10' as val union all
      select 3 as attrid, '20' as val 
     ),
     c1 as (
       select c.*, dense_rank() over (order by attrid) as attrnum,
              dense_rank() over (partition by attrid order by val) as valnum,
              count(*) over (partition by attrid) as cnt
       from c
     ),
     a1 as (
       select attrid, count(*) as cnt,
              cast(round(exp(sum(ln(count(*))) over (order by attrid rows between unbounded preceding and current row))) as int)/count(*) as cum
       from c
       group by attrid
     ),
     a2 as (
       select a.*,
              (select cast(round(exp(sum(ln(cnt)))) as int)
               from a1
               where a1.attrid <= a.attrid
              ) / cnt as cum
       from a1 a
     ),
     const as (
       select cast(round(exp(sum(ln(cnt)))) as int) as numrows
       from a1
     ),
     nums as (
       select 1 as n union all select 2 union all select 3 union all select 4 union all
       select 5 union all select 6 union all select 7 union all select 8
       from const
     ),
     ac as (
      select c1.*, a1.cum, const.numrows
      from c1 join
           a1 on c1.attrid = a1.attrid cross join
           const
     )
select *
from nums join
     ac
     on (nums.n/cum) % cnt = valnum - 1
order by 1, 2;

(Note: generate_series() was not working correctly for some reason with certain joins, which is why it manually generates the sequence of numbers.)

When SQL Fiddle gets working again, I should be able to translate this back to SQL Server.

EDIT II:

Here is the version that works in SQL Server:

with attributecount1 as (
      select c.AttributeId, count(*) as cnt
      from _Combinations c
      group by c.AttributeId
     ),
     const as (
      select cast(round(exp(sum(log(cnt))), 1) as int) as tot, count(*) as numattr
      from attributecount1
     ),
     attributecount as (
       select a.*,
              (tot / a.cnt) as numtimes,
              (select cast(round(exp(sum(log(ac1.cnt))), 1) as int)
               from attributecount1 ac1
               where ac1.AttributeId <= a.AttributeId
              ) / a.cnt as cum
       from attributecount1 a cross join const
     ),
     c as (
       select c.*, ac.numtimes, ac.cum, ac.cnt,
              dense_rank() over (order by c.AttributeId) as attrnum,
              dense_rank() over (partition by c.AttributeId order by Value) as valnum
       from _Combinations c join
            AttributeCount ac
            on ac.AttributeId = c.AttributeId
     ),
     nums as (
       select 1 as n union all
       select 1 + n
       from nums cross join const
       where 1 + n <= const.tot
     )
select *
from nums join
     c
     on (nums.n / c.cum)%c.cnt = c.valnum - 1
option (MAXRECURSION 1000)

THe SQL Fiddle is here.

Upvotes: 6

geomagas
geomagas

Reputation: 3280

I've decided to post this, just for the sake of a procedural solution appearing in parallel with the CTE-based ones.

The following produces a zero-based GroupKey column. If you want it to start from 1, simply change @i to @i+1 in the last insert ... select.

-- Add a zero-based row number, partitioned by AttributeId
declare @Attrs table (AttributeId int,Value varchar(50),RowNum int)
insert into @Attrs
select 
  AttributeId,Value,
  ROW_NUMBER()over(partition by AttributeId order by AttributeId,Value)-1
from _Combinations

-- AttributeId value counts
declare @AttCount table (AttributeId int,n int)
insert into @AttCount
select AttributeId,COUNT(*) n from @Attrs
group by AttributeID

-- Total number of combos -- Multiply all AttributeId counts
-- EXP(SUM(LOG(n))) didnt work as expected
-- so fall back to good old cursors...
declare @ncombos int,@num int
declare mulc cursor for select n from @AttCount
open mulc
set @ncombos=1
fetch next from mulc into @num
while @@FETCH_STATUS=0
  begin
  set @ncombos=@ncombos*@num
  fetch next from mulc into @num
  end
close mulc
deallocate mulc

-- Now let's get our hands dirty...
declare @i int,@m int,@atid int,@n int,@r int
declare c cursor for select AttributeId,n from @AttCount
open c
fetch next from c into @atid,@n
set @m=1
while @@FETCH_STATUS=0
  begin
  set @i=0
  while @i<@ncombos
    begin
    set @r=(@i/@m)%@n
    insert into _CombinedAttributes (GroupKey,AttributeId,Value)
    select @i,@atid,value from @Attrs where AttributeId=@atid and RowNum=@r
    set @i=@i+1
    end
  set @m=@m*@n
  fetch next from c into @atid,@n
  end
close c
deallocate c

Hint: Here's why I didn't use exp(sum(log())) to emulate a mul() aggregate.

Upvotes: 1

nathan_jr
nathan_jr

Reputation: 9282

Years ago I faced a similar problem with a fixed EAV schema not unlike yours. Peter Larsson came up with the below solution to address my "dynamic combinations" query.

I've adapted it to fit your schema. Hope this helps!

SqlFiddle Here

;with cteSource (Iteration, AttributeID, recID, Items, Unq, Perm) as 
(   
    select  v.Number + 1,
            s.AttributeId,
            row_number() over (order by v.Number, s.AttributeID) - 1,
            s.Items,
            u.Unq,
            f.Perm
    from    (select AttributeID, count(*) from  _Combinations group by AttributeID) s(AttributeId, Items)
    cross 
    join    (select count(distinct AttributeID) from _Combinations) u (Unq)
    join    master..spt_values as v on v.Type = 'P'
    outer 
    apply   (
                select  top(1) cast(exp(sum(log(count(*))) over ()) as bigint)
                from    _Combinations as w
                where   w.AttributeID >= s.AttributeID
                group 
                by      w.AttributeID
                having  count(*) > 1
            ) as f(Perm)
    where   v.Number < (select top(1) exp(sum(log(count(*))) over()) from _Combinations as x group by x.AttributeID)
)
select  s.Iteration,
        s.AttributeID,
        w.Value     
from    cteSource as s
cross 
apply   (
            select  Value,
                    row_number() over (order by Value) - 1
            from    _Combinations
            where   AttributeID = s.AttributeID
        ) w(Value, recID)
where   coalesce(s.recID / (s.Perm * s.Unq / s.Items), 0) % s.Items = w.recID
order 
by      s.Iteration, s.AttributeId;

Upvotes: 1

Related Questions