user1659943
user1659943

Reputation: 57

Create a dummy variable for the last rows based on on another variable

I would like to create a dummy variable that will look at the variable "count" and label the rows as 1 starting from the last row of each id. As an example ID 1 has count of 3 and the last three rows of this id will have such pattern: 0,0,1,1,1 Similarly, ID 4 which has a count of 1 will have 0,0,0,1. The IDs have different number of rows. The variable "wish" shows what I want to obtain as a final output.

input  byte id count wish str9 date
1   3   0   22sep2006
1   3   0   23sep2006
1   3   1   24sep2006
1   3   1   25sep2006
1   3   1   26sep2006
2   4   1   22mar2004
2   4   1   23mar2004
2   4   1   24mar2004
2   4   1   25mar2004
3   2   0   28jan2003
3   2   0   29jan2003
3   2   1   30jan2003
3   2   1   31jan2003
4   1   0   02dec1993
4   1   0   03dec1993
4   1   0   04dec1993
4   1   1   05dec1993
5   1   0   08feb2005
5   1   0   09feb2005
5   1   0   10feb2005
5   1   1   11feb2005
6   3   0   15jan1999
6   3   0   16jan1999
6   3   1   17jan1999
6   3   1   18jan1999
6   3   1   19jan1999
end 

Upvotes: 0

Views: 149

Answers (2)

Roberto Ferrer
Roberto Ferrer

Reputation: 11102

For future questions, you should provide your failed attempts. This shows that you have done your part, namely, research your problem.

One way is:

clear
set more off

*----- example data -----

input ///
byte id count wish str9 date
1   3   0   22sep2006
1   3   0   23sep2006
1   3   1   24sep2006
1   3   1   25sep2006
1   3   1   26sep2006
2   4   1   22mar2004
2   4   1   23mar2004
2   4   1   24mar2004
2   4   1   25mar2004
3   2   0   28jan2003
3   2   0   29jan2003
3   2   1   30jan2003
3   2   1   31jan2003
4   1   0   02dec1993
4   1   0   03dec1993
4   1   0   04dec1993
4   1   1   05dec1993
5   1   0   08feb2005
5   1   0   09feb2005
5   1   0   10feb2005
5   1   1   11feb2005
6   3   0   15jan1999
6   3   0   16jan1999
6   3   1   17jan1999
6   3   1   18jan1999
6   3   1   19jan1999
end 

list, sepby(id)

*----- what you want -----

bysort id: gen wish2 = _n > (_N - count)

list, sepby(id)

I assume you already sorted your date variable within ids.

Upvotes: 3

Sam Larson
Sam Larson

Reputation: 91

One way to accomplish this would be to use within-group row numbers using 'bysort'-type logic:

***Create variable of within-group row numbers.

bysort id: gen obsnum = _n

***Calculate total number of rows within each group.

by id: egen max_obsnum = max(obsnum)

***Subtract the count variable from the group row count.
***This is the number of rows where we want the dummy to equal zero.

gen max_obsnum_less_count = max_obsnum - count

***Create the dummy to equal one when the row number is
***greater than this last variable.

gen dummy = (obsnum > max_obsnum_less_count)

***Clean up.

drop obsnum max_obsnum max_obsnum_less_count

Upvotes: 2

Related Questions