trinitysara
trinitysara

Reputation: 185

How to combine consecutive observations together by group in Stata?

I have a dataset of interviews between doctors and patients. There is a variable QuestionNumber; Speaker, which indicates if the doctor (MD) or patient (P) is speaking; Speech, which contains what the speaker said, and Row, which sorts the dataset chronologically.

Row    QuestionNumber    Speaker    Speech
 1     1                 MD         Permission to record?
 2     1                 P          Yes
 3     1                 MD         Great
 4     2                 MD         I'd like to ask you-
 5     2                 MD         What was that?
 6     2                 P          Excuse me (blows nose)

For every question number, I would like every other observation (row) to be the same speaker. However, as you can see in Question (2), MD speaks consecutively (rows 4 and 5). I would like to combine the speech in these observations--i.e., combine the text when the same speaker speaks consecutively, within the same Question number.

I would like to have the final dataset look like this:

Row    QuestionNumber    Speaker    SpeechNEW
 1     1                 MD         Permission to record?
 2     1                 P          Yes
 3     1                 MD         Great
 4     2                 MD         I'd like to ask you- What was that?
 5     2                 P          Excuse me (blows nose)

I can't seem to find an existing solution online. Any advice would be appreciated--thanks!

Upvotes: 0

Views: 828

Answers (1)

Nick Cox
Nick Cox

Reputation: 37208

clear 
input Row    QuestionNumber    str2 Speaker    str42 Speech
 1     1                 MD         "Permission to record?"
 2     1                 P          "Yes"
 3     1                 MD         "Great"
 4     2                 MD         "I'd like to ask you-"
 5     2                 MD         "What was that?"
 6     2                 P          "Excuse me (blows nose)"
 end 

 bysort Question (Row) : generate Comment = sum(Speaker != Speaker[_n-1]) 
 bysort Question Comment (Row) : replace Speech = Speech[_n-1] + " " + Speech if _n > 1 
 by Question Comment : keep if _n == _N 

 list, sepby(Question) 

     +--------------------------------------------------------------------------+
     | Row   Questi~r   Speaker                                Speech   Comment |
     |--------------------------------------------------------------------------|
  1. |   1          1        MD                 Permission to record?         1 |
  2. |   2          1         P                                   Yes         2 |
  3. |   3          1        MD                                 Great         3 |
     |--------------------------------------------------------------------------|
  4. |   5          2        MD   I'd like to ask you- What was that?         1 |
  5. |   6          2         P                Excuse me (blows nose)         2 |
     +--------------------------------------------------------------------------+

Upvotes: 1

Related Questions