Reputation: 43
In the original choice data set, individuals (id
) are captured making purchases (choice
) among all the product options possible (assortchoice
is a product code). Every individual always faces the same set of products to choose from; as a result the value of choice
is always either 0 or 1 ("was the product chosen or not?").
clear
input
id assortchoice choice sumchoice
2 12 1 2
2 13 0 2
2 14 0 2
2 15 0 2
2 16 0 2
2 17 0 2
2 18 0 2
2 19 0 2
2 20 0 2
2 21 0 2
2 22 0 2
2 23 1 2
3 12 1 1
3 13 0 1
3 14 0 1
3 15 0 1
3 16 0 1
3 17 0 1
3 18 0 1
3 19 0 1
3 20 0 1
3 21 0 1
3 22 0 1
3 23 0 1
4 12 1 3
4 13 0 3
4 14 1 3
4 15 1 3
4 16 0 3
4 17 0 3
4 18 0 3
4 19 0 3
4 20 0 3
4 21 0 3
4 22 0 3
4 23 0 3
end
I created the following code to understand how many choices were made by each individual:
egen sumchoice=total(choice), by(id)
In this example, an individual 3 (id=3) only chose one product (since sumchoice=1), but individual 2 made two choices (sumchoice=2). Finally, individual 4 made three choices (sumchoice=3).
Since this is a choice data, I need to transform all the instances of multiple choices into sets of single choices.
What I mean by that: if an individual made two purchases, I need to duplicate the choice set for that individual twice; for an individual who made 3 purchases, I need to replicate the choice set three times, so the final structure looks like the data set below.
clear
input
id transaction assortchoice choice
2 1 12 1
2 1 13 0
2 1 14 0
2 1 15 0
2 1 16 0
2 1 17 0
2 1 18 0
2 1 19 0
2 1 20 0
2 1 21 0
2 1 22 0
2 1 23 0
2 2 12 0
2 2 13 0
2 2 14 0
2 2 15 0
2 2 16 0
2 2 17 0
2 2 18 0
2 2 19 0
2 2 20 0
2 2 21 0
2 2 22 0
2 2 23 1
3 1 12 1
3 1 13 0
3 1 14 0
3 1 15 0
3 1 16 0
3 1 17 0
3 1 18 0
3 1 19 0
3 1 20 0
3 1 21 0
3 1 22 0
3 1 23 0
4 1 12 1
4 1 13 0
4 1 14 0
4 1 15 0
4 1 16 0
4 1 17 0
4 1 18 0
4 1 19 0
4 1 20 0
4 1 21 0
4 1 22 0
4 1 23 0
4 2 12 0
4 2 13 0
4 2 14 1
4 2 15 0
4 2 16 0
4 2 17 0
4 2 18 0
4 2 19 0
4 2 20 0
4 2 21 0
4 2 22 0
4 2 23 0
4 3 12 0
4 3 13 0
4 3 14 0
4 3 15 1
4 3 16 0
4 3 17 0
4 3 18 0
4 3 19 0
4 3 20 0
4 3 21 0
4 3 22 0
4 3 23 0
end
***update:
transaction
indicates which transaction order this is:
bysort id assortchoice (choice): gen transaction=_n
Hence, choice=1 should appear only once per each transaction.
Upvotes: 1
Views: 52
Reputation: 37208
The answer isn't quite "use expand
" as there is a twist that you don't want exact replicates.
expand sumchoice
bysort id assortchoice (choice) : replace choice = 0 if _n != _N & choice == 1
list if id == 2 , sepby(assortchoice)
+-----------------------------------+
| id assort~e choice sumcho~e |
|-----------------------------------|
1. | 2 12 0 2 |
2. | 2 12 1 2 |
|-----------------------------------|
3. | 2 13 0 2 |
4. | 2 13 0 2 |
|-----------------------------------|
5. | 2 14 0 2 |
6. | 2 14 0 2 |
|-----------------------------------|
7. | 2 15 0 2 |
8. | 2 15 0 2 |
|-----------------------------------|
9. | 2 16 0 2 |
10. | 2 16 0 2 |
|-----------------------------------|
11. | 2 17 0 2 |
12. | 2 17 0 2 |
|-----------------------------------|
13. | 2 18 0 2 |
14. | 2 18 0 2 |
|-----------------------------------|
15. | 2 19 0 2 |
16. | 2 19 0 2 |
|-----------------------------------|
17. | 2 20 0 2 |
18. | 2 20 0 2 |
|-----------------------------------|
19. | 2 21 0 2 |
20. | 2 21 0 2 |
|-----------------------------------|
21. | 2 22 0 2 |
22. | 2 22 0 2 |
|-----------------------------------|
23. | 2 23 0 2 |
24. | 2 23 1 2 |
+-----------------------------------+
Upvotes: 1