Reputation: 3
I conducted a phone survey and here is the prototype of my dataset:
var1 var2
6666 1
6666 2
7676 2
7676 1
8876 1
8876 2
89898 1
89898 2
9999 1
9999 2
5656 1
5656 2
2323 1
2323 2
9876 1
7654 1
var1
is the unique identifier for each case in my survey (in this case, phone numbers).
var2
is the outcome of the survey: 1 (successful), 2 (not successful).
I want keep the observations for each var1
whose var2 == 1
, yet retaining the observations for each var1
whosevar2 == 2
if there is no another case where var2 == 1
.
I have tried
duplicates drop var1 if var2 == 2, force
but I am not getting the desired output
Upvotes: 0
Views: 74
Reputation: 37208
The question is wrongly titled: you don't want to overwrite anything.
Your syntax doesn't work as you wish because it is not what you want. You are asking whether there are duplicates of var1 if var2 == 2
and that command pays no attention whatsoever to observations for which var2 == 1
.
Your example includes no observations for which var2 == 2
but there is no corresponding observation with var2 == 1
. I have added one such.
Here's one way of meeting your goal. I show in passing that the duplicates
command you have does nothing for this example; nor would it be expected to do anything.
. clear
. input var1 var2
var1 var2
1. 6666 1
2. 6666 2
3. 7676 2
4. 7676 1
5. 8876 1
6. 8876 2
7. 89898 1
8. 89898 2
9. 9999 1
10. 9999 2
11. 5656 1
12. 5656 2
13. 2323 1
14. 2323 2
15. 9876 1
16. 7654 1
17. 42 2
18. end
. duplicates list var1 if var2 == 2
Duplicates in terms of var1
(0 observations are duplicates)
. bysort var1 (var2) : assert _N == 1 | _N == 2
. by var1 : drop if _n == 2 & var2[2] == 2
(7 observations deleted)
. list, sepby(var1)
+--------------+
| var1 var2 |
|--------------|
1. | 42 2 |
|--------------|
2. | 2323 1 |
|--------------|
3. | 5656 1 |
|--------------|
4. | 6666 1 |
|--------------|
5. | 7654 1 |
|--------------|
6. | 7676 1 |
|--------------|
7. | 8876 1 |
|--------------|
8. | 9876 1 |
|--------------|
9. | 9999 1 |
|--------------|
10. | 89898 1 |
+--------------+
Another way to do it would be
. bysort var1 (var2) : keep if _n == 1 & var2[2] == 2
In fact
. bysort var1 (var2): keep if _n == 1
keeps observations with var2 == 1
if there are any and otherwise will also keep singletons with var2 == 2
.
The hidden assumptions seem to include at most two observations for each distinct var1
. Note the use of assert
for checking assumptions about the dataset.
Upvotes: 1