Reputation: 1059
Assume I have a list_a
variable with all possible sports played in the world:
football
tennis
hockey
cricket
croquet
racquetball
cricket
pingpong
squash
rugby
swimming
swimming
soccer
Also assume I have another variable list_b
of only three sports:
cricket
hockey
swimming
I want to create a new variable Cont
, which will equal 1
when the sports in list_a
are found in list_b
, and equal to 0
when the sport is not in list_b
.
This is what variable Cont
would look like:
0
0
1
1
0
0
1
0
0
0
1
1
0
Will the following work:
gen Cont = 0
replace Cont = 1 if (strmatch( list_a, ( list_b)))
EDIT:
Suppose list_a
also contained hoccckey
(which is a typo) but I still want it to get counted.
Is there a way to do that?
Upvotes: 0
Views: 205
Reputation:
The answer is no because your approach will compare the values of the two variables in each observation. Instead, you need to compare the value at each row of list_a
, with all values of variable list_b
.
Using your toy example:
clear
input strL(list_a list_b)
football cricket
tennis hockey
hockey swimming
cricket
croquet
racquetball
cricket
pingpong
squash
rugby
swimming
swimming
soccer
end
The following illustrates the philosophy:
local obs = _N
generate Cont = 0
forvalues i = 1 / `obs' {
forvalues j = 1 / `obs' {
replace Cont = 1 if list_a[`i'] == list_b[`j'] in `i'
}
}
list
+-------------------------------+
| list_a list_b Cont |
|-------------------------------|
1. | football cricket 0 |
2. | tennis hockey 0 |
3. | hockey swimming 1 |
4. | cricket 1 |
5. | croquet 0 |
|-------------------------------|
6. | racquetball 0 |
7. | cricket 1 |
8. | pingpong 0 |
9. | squash 0 |
10. | rugby 0 |
|-------------------------------|
11. | swimming 1 |
12. | swimming 1 |
13. | soccer 0 |
+-------------------------------+
EDIT:
If you have certain typos that you additionally want to take into account, you can combine my solution with @NickCox's. In the above loop use instead:
replace Cont = 1 if inlist(list_a, "hoccckey") | list_a[`i'] == list_b[`j'] in `i'
Upvotes: 2
Reputation: 37208
There is a simple technique that works fine for your toy example:
clear
input strL list_a
football
tennis
hockey
cricket
croquet
racquetball
cricket
pingpong
squash
rugby
swimming
swimming
soccer
end
gen wanted = inlist(list_a, "cricket", "hockey", "swimming")
list, sepby(wanted)
+----------------------+
| list_a wanted |
|----------------------|
1. | football 0 |
2. | tennis 0 |
|----------------------|
3. | hockey 1 |
4. | cricket 1 |
|----------------------|
5. | croquet 0 |
6. | racquetball 0 |
|----------------------|
7. | cricket 1 |
|----------------------|
8. | pingpong 0 |
9. | squash 0 |
10. | rugby 0 |
|----------------------|
11. | swimming 1 |
12. | swimming 1 |
|----------------------|
13. | soccer 0 |
+----------------------+
If you had many more values, you could loop over the distinct values sought, using levelsof
if they are in a second variable, or put the candidates in a separate dataset and merge
as explained in this FAQ.
All these techniques depend on exact equality of strings, so watch out for variations between upper and lower case, leading and trailing spaces and inconsistencies in spelling.
Upvotes: 1