Reputation: 61
I have a dataset in Stata where one observation is spread out over multiple rows like the table below. The variables are string except for the id, and there exist some duplicate entries for some variables (like the last row in the table).
id | var1 | var2 | var3 |
---|---|---|---|
1 | name1 | ||
1 | name2 | ||
1 | name3 | ||
2 | name4 | ||
2 | name5 | ||
3 | name6 | ||
3 | name8 | ||
3 | name9 |
I want to take the first value and combine all variables to one row / observation. I think this is a really easy task but somehow I don't manage to figure it out.
id | var1 | var2 | var3 |
---|---|---|---|
1 | name1 | name2 | name3 |
2 | name4 | name5 | |
3 | name6 | name8 |
Upvotes: 0
Views: 1912
Reputation: 37208
I am going to assume as implied in text that name9
is really the same as name8
. That being so, here is one solution.
* Example generated by -dataex-. For more info, type help dataex
clear
input byte id str5(var1 var2 var3)
1 "name1" "" ""
1 "" "name2" ""
1 "" "" "name3"
2 "name4" "" ""
2 "" "name5" ""
3 "name6" "" ""
3 "" "" "name8"
3 "" "" "name8"
end
forval j = 1/3 {
bysort id (var`j') : replace var`j' = var`j'[_N]
}
duplicates drop
+----------------------------+
| id var1 var2 var3 |
|----------------------------|
1. | 1 name1 name2 name3 |
2. | 2 name4 name5 |
3. | 3 name6 name8 |
+----------------------------+
EDIT In the event that what is wanted is the first non-missing string value, collapse
remains the solution of choice, but here is a solution without it.
clear
input byte id str5(var1 var2 var3)
1 "name1" "" ""
1 "" "name2" ""
1 "" "" "name3"
2 "name4" "" ""
2 "" "name5" ""
3 "name6" "" ""
3 "" "" "name8"
3 "" "" "name9"
end
gen long obsno = _n
forval j = 1/3 {
bysort id : egen firstnm = min(cond(var`j' != "", obsno, .))
replace var`j' = var`j'[firstnm]
drop firstnm
}
drop obsno
duplicates drop
list
Upvotes: 1
Reputation: 599
It looks like collapse
's service here.
collapse (firstnm) var*, by(id)
Upvotes: 1