TerribleStudent
TerribleStudent

Reputation: 61

Combine several rows into one observation

I have a dataset in Stata where one observation is spread out over multiple rows like the table below. The variables are string except for the id, and there exist some duplicate entries for some variables (like the last row in the table).

id var1 var2 var3
1 name1
1 name2
1 name3
2 name4
2 name5
3 name6
3 name8
3 name9

I want to take the first value and combine all variables to one row / observation. I think this is a really easy task but somehow I don't manage to figure it out.

id var1 var2 var3
1 name1 name2 name3
2 name4 name5
3 name6 name8

Upvotes: 0

Views: 1912

Answers (2)

Nick Cox
Nick Cox

Reputation: 37208

I am going to assume as implied in text that name9 is really the same as name8. That being so, here is one solution.

* Example generated by -dataex-. For more info, type help dataex
clear
input byte id str5(var1 var2 var3)
1 "name1" ""      ""     
1 ""      "name2" ""     
1 ""      ""      "name3"
2 "name4" ""      ""     
2 ""      "name5" ""     
3 "name6" ""      ""     
3 ""      ""      "name8"
3 ""      ""      "name8"
end

forval j = 1/3 { 
    bysort id (var`j') : replace var`j' = var`j'[_N]
}

duplicates drop 


     +----------------------------+
     | id    var1    var2    var3 |
     |----------------------------|
  1. |  1   name1   name2   name3 |
  2. |  2   name4   name5         |
  3. |  3   name6           name8 |
     +----------------------------+

EDIT In the event that what is wanted is the first non-missing string value, collapse remains the solution of choice, but here is a solution without it.

clear
input byte id str5(var1 var2 var3)
1 "name1" ""      ""     
1 ""      "name2" ""     
1 ""      ""      "name3"
2 "name4" ""      ""     
2 ""      "name5" ""     
3 "name6" ""      ""     
3 ""      ""      "name8"
3 ""      ""      "name9"
end

gen long obsno = _n 
forval j = 1/3 {
    bysort id : egen firstnm = min(cond(var`j' != "", obsno, .))
    replace var`j' = var`j'[firstnm]
    drop firstnm
} 

drop obsno 

duplicates drop 

list 

Upvotes: 1

Romalpa Akzo
Romalpa Akzo

Reputation: 599

It looks like collapse's service here.

collapse (firstnm) var*, by(id)

Upvotes: 1

Related Questions