Reputation: 10412
I have a Table Users (it has millions of rows)
Id Name Country Product
+----+---------------+---------------+--------------+
1 John Canada
2 Kate Argentina
3 Mark China
4 Max Canada
5 Sam Argentina
6 Stacy China
...
1000 Ken Canada
I want to fill the Product
column with A
, B
or C
based on percentages.
I have another table called CountriesStats like the following
Id Country A B C
+-----+---------------+--------------+-------------+----------+
1 Canada 60 20 20
2 Argentina 35 45 20
3 China 40 10 50
This table holds the percentage of people with each product. For example in Canada 60% of people have product A, 20% have product B and 20% have product C.
I would like to fill the Users table with data based on the Percentages in the second data. So for example if there are 1 million user in canada, I would like to fill 600000 of the Product
column in the Users table with A
200000 with B
and 200000 with C
Thanks for any help on how to do that. I do not mind doing it in multiple steps I jsut need hints on how can I achieve that in SQL
Upvotes: 3
Views: 93
Reputation: 1271003
The logic behind this is not too difficult. Assign a sequential counter to each person in each country. Then, using this value, assign the correct product based on this value. For instance, in your example, when the number is less than or equal to 600,000 then 'A' gets assigned. For 600,001 to 800,000 then 'B', and finally 'C' to the rest.
The following SQL accomplishes this:
with toupdate as (
select u.*,
row_number() over (partition by country order by newid()) as seqnum,
count(*) over (partition by country) as tot
from users u
)
update u
set product = (case when seqnum <= tot * A / 100 then 'A'
when seqnum <= tot * (A + B) / 100 then 'B'
else 'C'
end)
from toupdate u join
CountriesStats cs
on u.country = cs.country;
The with
statement defines an updatable subquery with the sequence number and total for each each country, on each row. This is a nice feature of SQL Server, but is not supported in all databases.
The from
statement is joining back to the CountriesStats
table to get the needed values for each country. And the case
statement does the necessary logic.
Note that the sequential number is assigned randomly, using newid()
, so the products should be assigned randomly through the initial table.
Upvotes: 2