Olga
Olga

Reputation: 43

Duplication of data entries by id if they meet a certain condition

In the original choice data set, individuals (id) are captured making purchases (choice) among all the product options possible (assortchoice is a product code). Every individual always faces the same set of products to choose from; as a result the value of choice is always either 0 or 1 ("was the product chosen or not?").

clear
input
id  assortchoice    choice  sumchoice
2   12  1   2
2   13  0   2
2   14  0   2
2   15  0   2
2   16  0   2
2   17  0   2
2   18  0   2
2   19  0   2
2   20  0   2
2   21  0   2
2   22  0   2
2   23  1   2

3   12  1   1
3   13  0   1
3   14  0   1
3   15  0   1
3   16  0   1
3   17  0   1
3   18  0   1
3   19  0   1
3   20  0   1
3   21  0   1
3   22  0   1
3   23  0   1

4   12  1   3
4   13  0   3
4   14  1   3
4   15  1   3
4   16  0   3
4   17  0   3
4   18  0   3
4   19  0   3
4   20  0   3
4   21  0   3
4   22  0   3
4   23  0   3
end

I created the following code to understand how many choices were made by each individual:

egen sumchoice=total(choice), by(id)

In this example, an individual 3 (id=3) only chose one product (since sumchoice=1), but individual 2 made two choices (sumchoice=2). Finally, individual 4 made three choices (sumchoice=3).

Since this is a choice data, I need to transform all the instances of multiple choices into sets of single choices.

What I mean by that: if an individual made two purchases, I need to duplicate the choice set for that individual twice; for an individual who made 3 purchases, I need to replicate the choice set three times, so the final structure looks like the data set below.

clear
input
id  transaction assortchoice    choice
2   1   12  1
2   1   13  0
2   1   14  0
2   1   15  0
2   1   16  0
2   1   17  0
2   1   18  0
2   1   19  0
2   1   20  0
2   1   21  0
2   1   22  0
2   1   23  0

2   2   12  0
2   2   13  0
2   2   14  0
2   2   15  0
2   2   16  0
2   2   17  0
2   2   18  0
2   2   19  0
2   2   20  0
2   2   21  0
2   2   22  0
2   2   23  1
            
3   1   12  1
3   1   13  0
3   1   14  0
3   1   15  0
3   1   16  0
3   1   17  0
3   1   18  0
3   1   19  0
3   1   20  0
3   1   21  0
3   1   22  0
3   1   23  0
            
4   1   12  1
4   1   13  0
4   1   14  0
4   1   15  0
4   1   16  0
4   1   17  0
4   1   18  0
4   1   19  0
4   1   20  0
4   1   21  0
4   1   22  0
4   1   23  0
            
4   2   12  0
4   2   13  0
4   2   14  1
4   2   15  0
4   2   16  0
4   2   17  0
4   2   18  0
4   2   19  0
4   2   20  0
4   2   21  0
4   2   22  0
4   2   23  0
            
4   3   12  0
4   3   13  0
4   3   14  0
4   3   15  1
4   3   16  0
4   3   17  0
4   3   18  0
4   3   19  0
4   3   20  0
4   3   21  0
4   3   22  0
4   3   23  0

end

***update:

transaction indicates which transaction order this is:

bysort id assortchoice (choice): gen transaction=_n

Hence, choice=1 should appear only once per each transaction.

Upvotes: 1

Views: 52

Answers (1)

Nick Cox
Nick Cox

Reputation: 37208

The answer isn't quite "use expand" as there is a twist that you don't want exact replicates.

expand sumchoice 
bysort id assortchoice (choice) : replace choice = 0 if _n != _N & choice == 1 
list if id == 2 , sepby(assortchoice)

     +-----------------------------------+
     | id   assort~e   choice   sumcho~e |
     |-----------------------------------|
  1. |  2         12        0          2 |
  2. |  2         12        1          2 |
     |-----------------------------------|
  3. |  2         13        0          2 |
  4. |  2         13        0          2 |
     |-----------------------------------|
  5. |  2         14        0          2 |
  6. |  2         14        0          2 |
     |-----------------------------------|
  7. |  2         15        0          2 |
  8. |  2         15        0          2 |
     |-----------------------------------|
  9. |  2         16        0          2 |
 10. |  2         16        0          2 |
     |-----------------------------------|
 11. |  2         17        0          2 |
 12. |  2         17        0          2 |
     |-----------------------------------|
 13. |  2         18        0          2 |
 14. |  2         18        0          2 |
     |-----------------------------------|
 15. |  2         19        0          2 |
 16. |  2         19        0          2 |
     |-----------------------------------|
 17. |  2         20        0          2 |
 18. |  2         20        0          2 |
     |-----------------------------------|
 19. |  2         21        0          2 |
 20. |  2         21        0          2 |
     |-----------------------------------|
 21. |  2         22        0          2 |
 22. |  2         22        0          2 |
     |-----------------------------------|
 23. |  2         23        0          2 |
 24. |  2         23        1          2 |
     +-----------------------------------+

Upvotes: 1

Related Questions