Dacius
Dacius

Reputation: 94

Moving data over - need cleaner method

I have a data table with 24 columns (12-2 column groups). The placement of the data is all over the place and I need to compress the data into a legible Dimension table. Here is the look of it

Business Key | Group1 | Group2 | Group3 | Group8 | Group11 | Group12
00001        | abc123 |        | efg456 | ght456 |         | 

I need it to look like

Business Key | Group1 | Group2 | Group3 | Group8 | Group11 | Group12
00001        | abc123 | efg456 | ght456 |        |         |

I have tried Coalescing the data, but it repeats the columns in the latter groups once there is a blank. I have a feeling I need to temp table the data and loop it through a join, but I cannot seem to get it right.

I am pretty sure I can do it with 12 joins. But there has to be a more elegant solution, since I have over 80 million records to look through.

To clarify: I did the following

Group1 = Coalesce(group1,group2,group3,...,group11,group12)
Group2 = Coalesce(group2,group3,...,group11,group12)
Group3 = Coalesce(group3,...,group11,group12)

etc... The Coalesce works for the first gap, but it keeps moving everything over because it does not know that it already moved the data in the column before.

Upvotes: 1

Views: 53

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269753

The logic for coalesce() is a bit more complicated:

select coalesce(group1, group2, group3, . . . ) as group1,
       (case when group1 is not null then coalesce(group2, group3, group4, . . .)
             when group2 is not null then coalesce(group3, group4, group5, . . )
        . . .
        end) as group2,
       . . .

As you can see, this gets real complicated real fast. I wonder if the following would have respectable performance:

select dt.businesskey, p.*
from datatable dt cross apply
     (select max(case when seqnum = 1 then grp end) as grp1,
             max(case when seqnum = 2 then grp end) as grp2,
             . . .
      from (select grp, row_number() over (order by num) as seqnum
            from (values(dt.group1, 1),
                        (dt.group2, 2),
                        . . .
                 ) v(grp, num)
            where grp is not null
           ) p
     ) p;

This looks quite complicated. But SQL Server does a pretty good job of optimizing apply for within-row transformations. It is worth trying to see if this works for you.

Upvotes: 2

Related Questions