Reputation: 78993
I have group membership encoded as dummy variables as follows:
+--------------------------+
| group1 group2 group3 |
|--------------------------|
1. | 0 1 0 |
2. | 0 0 1 |
3. | 0 0 1 |
4. | 0 1 0 |
5. | 1 0 0 |
6. | 1 0 0 |
7. | 1 0 0 |
8. | 1 0 0 |
+--------------------------+
I would like to convert the three groupX
variables into a single variable as follows:
group
2
3
3
2
1
1
1
1
This is kind of the "reverse" of doing xi i.group
, creating a categorical variable from dummies.
I thought egen foo = group(group*)
but it seems to code the resulting variable oddly:
+--------------------------------+
| group1 group2 group3 foo |
|--------------------------------|
1. | 0 1 0 2 |
2. | 0 0 1 1 |
3. | 0 0 1 1 |
4. | 0 1 0 2 |
5. | 1 0 0 3 |
|--------------------------------|
6. | 1 0 0 3 |
7. | 1 0 0 3 |
8. | 1 0 0 3 |
+--------------------------------+
Notice that egen
has coded group 3 as 1, and group 1 as 3.
Upvotes: 0
Views: 919
Reputation: 37338
Your question seems to presuppose strong assumptions that variables already have suffixes 1 up and that the indicator variables are disjoint.
That being so, this is an alternative to your code:
input group1 group2 group3
0 1 0
0 0 1
0 0 1
0 1 0
1 0 0
1 0 0
1 0 0
1 0 0
end
gen group = group1
forval j = 2/3 {
replace group = `j' if group`j'
}
The rationale for the ordering of egen, group(
varlist)
is that its results depend on the ordering after sort
varlist which in your case puts 0 0 1
first, 0 1 0
second, and 1 0 0
last, as 0
sorts before 1
on Stata's rules and the ordering is first on the first variable, second on the second variable, and so forth. This function of egen
is designed for grouping any combination of variables. numeric (indicator or otherwise) or string.
EDIT: This technique is more general. A key assumption remains that 1 occurs just once in each observation for a set of variables. But there are no assumptions about the number of variables and the variable names need not have the some prefix: you would just need to replace group*
by something giving the actual variable names.
. egen sgroup = concat(group*)
. gen group = strpos(sgroup, "1")
. l
+-------------------------------------------+
| group1 group2 group3 sgroup group |
|-------------------------------------------|
1. | 0 1 0 010 2 |
2. | 0 0 1 001 3 |
3. | 0 0 1 001 3 |
4. | 0 1 0 010 2 |
5. | 1 0 0 100 1 |
|-------------------------------------------|
6. | 1 0 0 100 1 |
7. | 1 0 0 100 1 |
8. | 1 0 0 100 1 |
+-------------------------------------------+
Upvotes: 1
Reputation: 78993
Here's a very clumsy way of doing it, but I'm sure there are slicker ways:
gen id = _n
reshape long group, i(id)
drop if group == 0
drop group id
rename _j group
Which results in:
+-----------------+
| group country |
|-----------------|
1. | 2 FOO |
2. | 3 FOO |
3. | 3 FOO |
4. | 2 BAR |
5. | 1 BAR |
6. | 1 BAR |
7. | 1 BAR |
8. | 1 BAR |
+-----------------+
Upvotes: 0