JodeCharger100
JodeCharger100

Reputation: 1059

Create a new variable if value in var1 exists in var2

Assume I have a list_a variable with all possible sports played in the world:

football 
tennis 
hockey
cricket
croquet
racquetball
cricket
pingpong
squash
rugby
swimming
swimming
soccer 

Also assume I have another variable list_b of only three sports:

cricket
hockey
swimming

I want to create a new variable Cont, which will equal 1 when the sports in list_a are found in list_b, and equal to 0 when the sport is not in list_b.

This is what variable Cont would look like:

0
0
1
1
0
0
1
0
0
0
1 
1
0

Will the following work:

gen Cont = 0
replace Cont = 1 if  (strmatch( list_a, ( list_b)))

EDIT:

Suppose list_a also contained hoccckey (which is a typo) but I still want it to get counted.

Is there a way to do that?

Upvotes: 0

Views: 205

Answers (2)

user8682794
user8682794

Reputation:

The answer is no because your approach will compare the values of the two variables in each observation. Instead, you need to compare the value at each row of list_a, with all values of variable list_b.

Using your toy example:

clear

input strL(list_a list_b)
football cricket
tennis hockey 
hockey swimming 
cricket 
croquet 
racquetball 
cricket
pingpong
squash
rugby
swimming 
swimming
soccer
end

The following illustrates the philosophy:

local obs = _N
generate Cont = 0

forvalues i = 1 / `obs' {
    forvalues j = 1 / `obs' {
        replace Cont = 1 if list_a[`i'] == list_b[`j'] in `i'
    }
}

list
     +-------------------------------+
     |      list_a     list_b   Cont |
     |-------------------------------|
  1. |    football    cricket      0 |
  2. |      tennis     hockey      0 |
  3. |      hockey   swimming      1 |
  4. |     cricket                 1 |
  5. |     croquet                 0 |
     |-------------------------------|
  6. | racquetball                 0 |
  7. |     cricket                 1 |
  8. |    pingpong                 0 |
  9. |      squash                 0 |
 10. |       rugby                 0 |
     |-------------------------------|
 11. |    swimming                 1 |
 12. |    swimming                 1 |
 13. |      soccer                 0 |
     +-------------------------------+

EDIT:

If you have certain typos that you additionally want to take into account, you can combine my solution with @NickCox's. In the above loop use instead:

replace Cont = 1 if inlist(list_a, "hoccckey") | list_a[`i'] == list_b[`j'] in `i'

Upvotes: 2

Nick Cox
Nick Cox

Reputation: 37208

There is a simple technique that works fine for your toy example:

clear 
input strL list_a 
football 
tennis 
hockey
cricket
croquet
racquetball
cricket
pingpong
squash
rugby
swimming
swimming
soccer 
end 

gen wanted = inlist(list_a, "cricket", "hockey", "swimming") 

list, sepby(wanted)

     +----------------------+
     |      list_a   wanted |
     |----------------------|
  1. |    football        0 |
  2. |      tennis        0 |
     |----------------------|
  3. |      hockey        1 |
  4. |     cricket        1 |
     |----------------------|
  5. |     croquet        0 |
  6. | racquetball        0 |
     |----------------------|
  7. |     cricket        1 |
     |----------------------|
  8. |    pingpong        0 |
  9. |      squash        0 |
 10. |       rugby        0 |
     |----------------------|
 11. |    swimming        1 |
 12. |    swimming        1 |
     |----------------------|
 13. |      soccer        0 |
     +----------------------+

If you had many more values, you could loop over the distinct values sought, using levelsof if they are in a second variable, or put the candidates in a separate dataset and merge as explained in this FAQ.

All these techniques depend on exact equality of strings, so watch out for variations between upper and lower case, leading and trailing spaces and inconsistencies in spelling.

Upvotes: 1

Related Questions