Reputation: 185
I have a dataset of interviews between doctors and patients. There is a variable QuestionNumber; Speaker, which indicates if the doctor (MD) or patient (P) is speaking; Speech, which contains what the speaker said, and Row, which sorts the dataset chronologically.
Row QuestionNumber Speaker Speech
1 1 MD Permission to record?
2 1 P Yes
3 1 MD Great
4 2 MD I'd like to ask you-
5 2 MD What was that?
6 2 P Excuse me (blows nose)
For every question number, I would like every other observation (row) to be the same speaker. However, as you can see in Question (2), MD speaks consecutively (rows 4 and 5). I would like to combine the speech in these observations--i.e., combine the text when the same speaker speaks consecutively, within the same Question number.
I would like to have the final dataset look like this:
Row QuestionNumber Speaker SpeechNEW
1 1 MD Permission to record?
2 1 P Yes
3 1 MD Great
4 2 MD I'd like to ask you- What was that?
5 2 P Excuse me (blows nose)
I can't seem to find an existing solution online. Any advice would be appreciated--thanks!
Upvotes: 0
Views: 828
Reputation: 37208
clear
input Row QuestionNumber str2 Speaker str42 Speech
1 1 MD "Permission to record?"
2 1 P "Yes"
3 1 MD "Great"
4 2 MD "I'd like to ask you-"
5 2 MD "What was that?"
6 2 P "Excuse me (blows nose)"
end
bysort Question (Row) : generate Comment = sum(Speaker != Speaker[_n-1])
bysort Question Comment (Row) : replace Speech = Speech[_n-1] + " " + Speech if _n > 1
by Question Comment : keep if _n == _N
list, sepby(Question)
+--------------------------------------------------------------------------+
| Row Questi~r Speaker Speech Comment |
|--------------------------------------------------------------------------|
1. | 1 1 MD Permission to record? 1 |
2. | 2 1 P Yes 2 |
3. | 3 1 MD Great 3 |
|--------------------------------------------------------------------------|
4. | 5 2 MD I'd like to ask you- What was that? 1 |
5. | 6 2 P Excuse me (blows nose) 2 |
+--------------------------------------------------------------------------+
Upvotes: 1