Reputation: 61

Combine several rows into one observation

I have a dataset in Stata where one observation is spread out over multiple rows like the table below. The variables are string except for the id, and there exist some duplicate entries for some variables (like the last row in the table).

id	var1	var2	var3
1	name1
1		name2
1			name3
2	name4
2		name5
3	name6
3			name8
3			name9

I want to take the first value and combine all variables to one row / observation. I think this is a really easy task but somehow I don't manage to figure it out.

id	var1	var2	var3
1	name1	name2	name3
2	name4	name5
3	name6		name8

Upvotes: 0

Answers (2)

Nick Cox

Reputation: 37208

I am going to assume as implied in text that name9 is really the same as name8. That being so, here is one solution.

* Example generated by -dataex-. For more info, type help dataex
clear
input byte id str5(var1 var2 var3)
1 "name1" ""      ""     
1 ""      "name2" ""     
1 ""      ""      "name3"
2 "name4" ""      ""     
2 ""      "name5" ""     
3 "name6" ""      ""     
3 ""      ""      "name8"
3 ""      ""      "name8"
end

forval j = 1/3 { 
    bysort id (var`j') : replace var`j' = var`j'[_N]
}

duplicates drop 


     +----------------------------+
     | id    var1    var2    var3 |
     |----------------------------|
  1. |  1   name1   name2   name3 |
  2. |  2   name4   name5         |
  3. |  3   name6           name8 |
     +----------------------------+

EDIT In the event that what is wanted is the first non-missing string value, collapse remains the solution of choice, but here is a solution without it.

clear
input byte id str5(var1 var2 var3)
1 "name1" ""      ""     
1 ""      "name2" ""     
1 ""      ""      "name3"
2 "name4" ""      ""     
2 ""      "name5" ""     
3 "name6" ""      ""     
3 ""      ""      "name8"
3 ""      ""      "name9"
end

gen long obsno = _n 
forval j = 1/3 {
    bysort id : egen firstnm = min(cond(var`j' != "", obsno, .))
    replace var`j' = var`j'[firstnm]
    drop firstnm
} 

drop obsno 

duplicates drop 

list

Upvotes: 1

Romalpa Akzo

Reputation: 599

It looks like collapse's service here.

collapse (firstnm) var*, by(id)

Upvotes: 1

Combine several rows into one observation

Answers (2)

Related Questions