Reputation: 958
I need to create dummy data for a presentations.
I have been given 10 records which I need to copy them and change their date each time for every day of the month. Then I need to delete random records per date ranging from 0 up to 4 records each day (so they seem more random data)
A simplified example
dummyId Name onDate
1 xd 2016-07-01
2 gd 2016-07-01
3 yd 2016-07-01
4 ad 2016-07-01
5 bd 2016-07-01
6 zd 2016-07-01
7 md 2016-07-01
8 qd 2016-07-01
9 nd 2016-07-01
10 dd 2016-07-01
11 xd 2016-07-02
12 gd 2016-07-02
13 yd 2016-07-02
14 ad 2016-07-02
15 bd 2016-07-02
16 zd 2016-07-02
17 md 2016-07-02
18 qd 2016-07-02
19 nd 2016-07-02
20 dd 2016-07-02
..
In the above example the records of the 1st of July where copied over having date 2d of July. This will go on for every July date. After that I need to delete from every group of dates [except from 1st of July] (Group By onDate) 0-4 records so the data to seem random.
eg below 3 records deleted for 2d of July
dummyId Name onDate
1 xd 2016-07-01
2 gd 2016-07-01
3 yd 2016-07-01
4 ad 2016-07-01
5 bd 2016-07-01
6 zd 2016-07-01
7 md 2016-07-01
8 qd 2016-07-01
9 nd 2016-07-01
10 dd 2016-07-01
11 xd 2016-07-02
13 yd 2016-07-02
14 ad 2016-07-02
16 zd 2016-07-02
17 md 2016-07-02
18 qd 2016-07-02
20 dd 2016-07-02
..
Upvotes: 1
Views: 274
Reputation: 82514
One way to do it is to create a cte that will generate a random number for each record and a row number randomly ordered for the records in each date:
Create and populate sample table (please, save us this step in your next questions):
DECLARE @T as table
(
dummyId int,
Name char(2),
onDate date
)
INSERT INTO @T VALUES
(1, 'xd', '2016-07-01'),
(2, 'gd', '2016-07-01'),
(3, 'yd', '2016-07-01'),
(4, 'ad', '2016-07-01'),
(5, 'bd', '2016-07-01'),
(6, 'zd', '2016-07-01'),
(7, 'md', '2016-07-01'),
(8, 'qd', '2016-07-01'),
(9, 'nd', '2016-07-01'),
(10, 'dd', '2016-07-01'),
(11, 'xd', '2016-07-02'),
(12, 'gd', '2016-07-02'),
(13, 'yd', '2016-07-02'),
(14, 'ad', '2016-07-02'),
(15, 'bd', '2016-07-02'),
(16, 'zd', '2016-07-02'),
(17, 'md', '2016-07-02'),
(18, 'qd', '2016-07-02'),
(19, 'nd', '2016-07-02'),
(20, 'dd', '2016-07-02')
Create a cte. Note that it contains a random number between 0 and 5 (inclusive), and a row number between 1 and 10 for each date, sorted randomly:
;WITH CTE AS
(
SELECT dummyId,
Name,
onDate,
ABS(CHECKSUM(NEWID()) % 6) as random,
ROW_NUMBER() OVER(PARTITION BY onDate ORDER BY NEWID()) As rowNumber
FROM @T
WHERE onDate <> '2016-07-01'
)
Delete from the cte. Note that the where clause is checking that random number is smaller then the row number. The where clause ensures that for each date, a random number of 0 to 4 records will be deleted:
DELETE
FROM CTE
WHERE rowNumber < random
Upvotes: 0
Reputation: 1270573
If you need to delete 0-4 records, you can use a random number. This is a pain, but in SQL Server you can do something like this:
delete s
from simplified s
where onDate <> '2016-07-01' and
rand(checksum(newid())) < 0.15;
This will remove about 15% of the records. This should be 1-4 records for each date (although there are no guarantees on the exact number for each date).
Another way to remove a fixed number for each date would be. For instance, to return three records from each date:
with todelete as (
select s.*, row_number() over (partition by date order by newid()) as seqnum
from simplified s
)
delete todelete
where seqnum <= 3 and onDate <> '2016-07-01';
Upvotes: 1