user42459
user42459

Reputation: 915

Changing ID from nth to last row if something happens at nth row

My data has some problem. The survey is conducted on housing unit. So the two rows with the same person ID might not actually indicate the same person.

I want to assign different ID for actually different person.

Let's say I have this data.

id  yearmonth  age 
1   200001      12
1   200002      12
1   200003      14
1   200004      14
1   200005      14

3rd row is definitely different person. Its age increase by 2.

So I want to change ID like

id  yearmonth  age 
1   200001      12
1   200002      12
10   200003      14
10   200004      14
10   200005      14

How can I do this? I think I can change the ID of 3rd row by writing

bysort id (yearmonth): replace id=id*10 if age[_n-1]>age+1 | age[_n-1]+1<age

(where I multiply by 10 because all IDs have the same number of numbers, so that multiplying by 10 won't give any duplicate)

But how can I change all subsequent rows?

Upvotes: 0

Views: 56

Answers (1)

user4690969
user4690969

Reputation:

Building on what you have, something like this might do what you want.

bysort id (yearmonth): generate idchange = age[_n-1]>age+1 | age[_n-1]+1<age
bysort id (yearmonth): generate numchange = sum(idchange)
replace id = 10*id + (idchange-1) if idchange>0

Note that this will handle the case where one original id has two or more changes detected. For up to 10 changes, anyhow.

id  yearmonth  age 
2   200001      12
2   200002      14
2   200003      15
2   200004      18
2   200005      18

Upvotes: 1

Related Questions