Y2theZ
Y2theZ

Reputation: 10412

Fill Users table with data using percentages from another table

I have a Table Users (it has millions of rows)

  Id         Name         Country          Product   
+----+---------------+---------------+--------------+
  1          John          Canada             
  2          Kate          Argentina                  
  3          Mark          China
  4          Max           Canada
  5          Sam           Argentina  
  6          Stacy         China           
  ...
  1000       Ken           Canada 

I want to fill the Product column with A, B or C based on percentages.

I have another table called CountriesStats like the following

  Id        Country         A             B            C
+-----+---------------+--------------+-------------+----------+
  1          Canada          60            20           20
  2          Argentina       35            45           20
  3          China           40            10           50

This table holds the percentage of people with each product. For example in Canada 60% of people have product A, 20% have product B and 20% have product C.

I would like to fill the Users table with data based on the Percentages in the second data. So for example if there are 1 million user in canada, I would like to fill 600000 of the Product column in the Users table with A 200000 with B and 200000 with C

Thanks for any help on how to do that. I do not mind doing it in multiple steps I jsut need hints on how can I achieve that in SQL

Upvotes: 3

Views: 93

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1271003

The logic behind this is not too difficult. Assign a sequential counter to each person in each country. Then, using this value, assign the correct product based on this value. For instance, in your example, when the number is less than or equal to 600,000 then 'A' gets assigned. For 600,001 to 800,000 then 'B', and finally 'C' to the rest.

The following SQL accomplishes this:

with toupdate as (
      select u.*,
             row_number() over (partition by country order by newid()) as seqnum,
             count(*) over (partition by country) as tot
      from users u
     )
update u
    set product = (case when seqnum <= tot * A / 100 then 'A'
                        when seqnum <= tot * (A + B) / 100 then 'B'
                        else 'C'
                   end)
    from toupdate u join
         CountriesStats cs
         on u.country = cs.country;

The with statement defines an updatable subquery with the sequence number and total for each each country, on each row. This is a nice feature of SQL Server, but is not supported in all databases.

The from statement is joining back to the CountriesStats table to get the needed values for each country. And the case statement does the necessary logic.

Note that the sequential number is assigned randomly, using newid(), so the products should be assigned randomly through the initial table.

Upvotes: 2

Related Questions