LondonRob
LondonRob

Reputation: 78993

Convert dummy variables to continuous variables

I have group membership encoded as dummy variables as follows:

     +--------------------------+
     | group1   group2   group3 |
     |--------------------------|
  1. |      0        1        0 |
  2. |      0        0        1 |
  3. |      0        0        1 |
  4. |      0        1        0 |
  5. |      1        0        0 |
  6. |      1        0        0 |
  7. |      1        0        0 |
  8. |      1        0        0 |
     +--------------------------+

I would like to convert the three groupX variables into a single variable as follows:

group
2
3
3
2
1
1
1
1

This is kind of the "reverse" of doing xi i.group, creating a categorical variable from dummies.

I thought egen foo = group(group*) but it seems to code the resulting variable oddly:

     +--------------------------------+
     | group1   group2   group3   foo |
     |--------------------------------|
  1. |      0        1        0     2 |
  2. |      0        0        1     1 |
  3. |      0        0        1     1 |
  4. |      0        1        0     2 |
  5. |      1        0        0     3 |
     |--------------------------------|
  6. |      1        0        0     3 |
  7. |      1        0        0     3 |
  8. |      1        0        0     3 |
     +--------------------------------+

Notice that egen has coded group 3 as 1, and group 1 as 3.

Upvotes: 0

Views: 919

Answers (2)

Nick Cox
Nick Cox

Reputation: 37338

Your question seems to presuppose strong assumptions that variables already have suffixes 1 up and that the indicator variables are disjoint.

That being so, this is an alternative to your code:

input group1 group2 group3 
 0 1 0 
 0 0 1 
 0 0 1 
 0 1 0 
 1 0 0 
 1 0 0 
 1 0 0 
 1 0 0 
end 
gen group = group1 
forval j = 2/3 { 
    replace group = `j' if group`j' 
} 

The rationale for the ordering of egen, group(varlist) is that its results depend on the ordering after sort varlist which in your case puts 0 0 1 first, 0 1 0 second, and 1 0 0 last, as 0 sorts before 1 on Stata's rules and the ordering is first on the first variable, second on the second variable, and so forth. This function of egen is designed for grouping any combination of variables. numeric (indicator or otherwise) or string.

EDIT: This technique is more general. A key assumption remains that 1 occurs just once in each observation for a set of variables. But there are no assumptions about the number of variables and the variable names need not have the some prefix: you would just need to replace group* by something giving the actual variable names.

. egen sgroup = concat(group*) 

. gen group = strpos(sgroup, "1") 

. l 

     +-------------------------------------------+
     | group1   group2   group3   sgroup   group |
     |-------------------------------------------|
  1. |      0        1        0      010       2 |
  2. |      0        0        1      001       3 |
  3. |      0        0        1      001       3 |
  4. |      0        1        0      010       2 |
  5. |      1        0        0      100       1 |
     |-------------------------------------------|
  6. |      1        0        0      100       1 |
  7. |      1        0        0      100       1 |
  8. |      1        0        0      100       1 |
     +-------------------------------------------+

Upvotes: 1

LondonRob
LondonRob

Reputation: 78993

Here's a very clumsy way of doing it, but I'm sure there are slicker ways:

gen id = _n
reshape long group, i(id)
drop if group == 0
drop group id
rename _j group

Which results in:

     +-----------------+
     | group   country |
     |-----------------|
  1. |     2       FOO |
  2. |     3       FOO |
  3. |     3       FOO |
  4. |     2       BAR |
  5. |     1       BAR |
  6. |     1       BAR |
  7. |     1       BAR |
  8. |     1       BAR |
     +-----------------+

Upvotes: 0

Related Questions