PanosPlat
PanosPlat

Reputation: 958

Delete random records [0-4 deletions] per group [date]

I need to create dummy data for a presentations.

I have been given 10 records which I need to copy them and change their date each time for every day of the month. Then I need to delete random records per date ranging from 0 up to 4 records each day (so they seem more random data)

A simplified example

dummyId Name  onDate
1       xd    2016-07-01
2       gd    2016-07-01
3       yd    2016-07-01
4       ad    2016-07-01
5       bd    2016-07-01
6       zd    2016-07-01
7       md    2016-07-01
8       qd    2016-07-01
9       nd    2016-07-01
10      dd    2016-07-01
11       xd    2016-07-02
12       gd    2016-07-02
13       yd    2016-07-02
14       ad    2016-07-02
15       bd    2016-07-02
16       zd    2016-07-02
17       md    2016-07-02
18       qd    2016-07-02
19       nd    2016-07-02
20       dd    2016-07-02

..

In the above example the records of the 1st of July where copied over having date 2d of July. This will go on for every July date. After that I need to delete from every group of dates [except from 1st of July] (Group By onDate) 0-4 records so the data to seem random.

eg below 3 records deleted for 2d of July

dummyId    Name  onDate
    1       xd    2016-07-01
    2       gd    2016-07-01
    3       yd    2016-07-01
    4       ad    2016-07-01
    5       bd    2016-07-01
    6       zd    2016-07-01
    7       md    2016-07-01
    8       qd    2016-07-01
    9       nd    2016-07-01
    10      dd    2016-07-01
    11       xd    2016-07-02
    13       yd    2016-07-02
    14       ad    2016-07-02
    16       zd    2016-07-02
    17       md    2016-07-02
    18       qd    2016-07-02
    20       dd    2016-07-02
.. 

Upvotes: 1

Views: 274

Answers (2)

Zohar Peled
Zohar Peled

Reputation: 82514

One way to do it is to create a cte that will generate a random number for each record and a row number randomly ordered for the records in each date:

Create and populate sample table (please, save us this step in your next questions):

DECLARE @T as table
(
    dummyId int,
    Name char(2),
    onDate date
)

INSERT INTO @T VALUES
(1, 'xd', '2016-07-01'),
(2, 'gd', '2016-07-01'),
(3, 'yd', '2016-07-01'),
(4, 'ad', '2016-07-01'),
(5, 'bd', '2016-07-01'),
(6, 'zd', '2016-07-01'),
(7, 'md', '2016-07-01'),
(8, 'qd', '2016-07-01'),
(9, 'nd', '2016-07-01'),
(10, 'dd', '2016-07-01'),
(11, 'xd', '2016-07-02'),
(12, 'gd', '2016-07-02'),
(13, 'yd', '2016-07-02'),
(14, 'ad', '2016-07-02'),
(15, 'bd', '2016-07-02'),
(16, 'zd', '2016-07-02'),
(17, 'md', '2016-07-02'),
(18, 'qd', '2016-07-02'),
(19, 'nd', '2016-07-02'),
(20, 'dd', '2016-07-02')

Create a cte. Note that it contains a random number between 0 and 5 (inclusive), and a row number between 1 and 10 for each date, sorted randomly:

;WITH CTE AS
(
    SELECT  dummyId, 
            Name, 
            onDate, 
            ABS(CHECKSUM(NEWID()) % 6) as random,
            ROW_NUMBER() OVER(PARTITION BY onDate ORDER BY NEWID()) As rowNumber
    FROM @T 
    WHERE onDate <> '2016-07-01'
)

Delete from the cte. Note that the where clause is checking that random number is smaller then the row number. The where clause ensures that for each date, a random number of 0 to 4 records will be deleted:

DELETE
FROM CTE
WHERE rowNumber < random

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1270573

If you need to delete 0-4 records, you can use a random number. This is a pain, but in SQL Server you can do something like this:

delete s
    from simplified s
    where onDate <> '2016-07-01' and
          rand(checksum(newid())) < 0.15;

This will remove about 15% of the records. This should be 1-4 records for each date (although there are no guarantees on the exact number for each date).

Another way to remove a fixed number for each date would be. For instance, to return three records from each date:

with todelete as (
      select s.*, row_number() over (partition by date order by newid()) as seqnum
      from simplified s
     )
delete todelete
    where seqnum <= 3 and onDate <> '2016-07-01';

Upvotes: 1

Related Questions