iPlexpen
iPlexpen

Reputation: 409

Create variable based on value in multiple columns?

There is a rather large Stata dataset (education) with 60+ variables devoted to 'exam taken' information and a few others based on student gender, age, demographics, etc. There are tens of thousands of students (rows). Unfortunately the grades on various tests are not standard (combo of letters and numbers, and may appear in any of the 60+ columns for each student, depending on when they took the relevant exam). I'm trying to create a new variable, identifying all those who took some variation of the G40 or G41 exam at this time. The grade columns are all assigned as dx with a number, so I've started by trying the following:

    gen byte event = 0 
    replace event = 1 if dx1 == "G40" | dx1 == "G41"| dx2 == "G40" | dx2 == "G41" | dx3 == "G40" | dx3 == "G41" | dx4 == "G40" | dx4 == "G41" | dx5 == "G40" | dx5 == "G41" & age < 12

I don't want to write out every single one of the 60+ columns each time I'm making a new variable for a new exam. Is there a faster way of doing this?

Upvotes: 0

Views: 786

Answers (1)

Nick Cox
Nick Cox

Reputation: 37183

I am going to show two techniques, as one is good for the smaller code example you give and one is better for 60+ "columns" (variables!).

Just your example I would tend to write as one line

gen byte event = (  inlist("G40", dx1, dx2, dx3, dx4, dx5) |  /// 
inlist("G41", dx1, dx2, dx3, dx4, dx5) ) & age < 12

For 60+ such variables I would write a loop.

gen byte event = 0 

foreach v of var dx* { 
    display "`v' " _c 
    replace event = 1 if inlist(`v', "G40", "G41") & age < 12 
} 

where for purposes of debugging, or just understanding, the output is noisier than would be customary once the operations seem routine. A standard trick with inlist() is to note that a test of the form foo == whatever is the same as a test of whatever == foo so there is often a choice about which argument is first and which other argument(s) follow.

Upvotes: 1

Related Questions