Reputation: 551
I'd like to create a variable (counter) which counts non-missing values of another variable (firmage0) in every year. My code looks like this:
sort year
quietly by year: gen counter = _n if firmage0 != .
Unfortunately, Stata starts counting with 1 even if there are missing values. But I'd like to start counting when there is a non-missing value. In addition the second nonmissing value should have the value "2", the third "3" and so on. There is one important restriction. Don't use preserve and restore! After creating the variable counter, my dataset looks like this (this is only a small part of it). Free space means missing value!
year firmage0 counter
1975
1975
1975
1975
1975 21 5
1975
1975 60 7
1975
1975
1975
1975
1975
1975
1975
1975
1975
1975
1975
1975
1975 115 20
1975
1975
1975
But the dataset should look like this:
year firmage0 counter
1975
1975
1975
1975
1975 21 1
1975
1975 60 2
1975
1975
1975
1975
1975
1975
1975
1975
1975
1975
1975
1975
1975 115 3
1975
1975
1975
Upvotes: 3
Views: 7970
Reputation: 11102
The system variable _n
is fixed in the sense that it denotes the number of the current observation independently of your if
condition. What you are asking for with your code is a counter
that takes on the value of the number of the observation if it is not missing. This is different from the count of non-missing observations up until a certain point, which is what you really want.
Be aware of the
data type of your missings. As you present it, firmage0
seems like a string variable because
blanks are missings for strings. Missings for numeric values are
represented as .
(or .<something>
). See help missings
.
One solution is doing a cumulative sum
conditional on the missings
. I assume variable type is numeric:
clear all
set more off
input year firmage0
1975 .
1975 .
1975 .
1975 .
1975 21
1980 60
1980 .
1980 89
1985 .
1985 .
1985 23
1985 45
1985 47
1985 .
end
list, sepby(year)
bysort year: gen counter = sum(!missing(firmage0))
replace counter = . if missing(firmage0) // optional
list, sepby(year)
!missing(firmage0)
evaluates to false (0) or true (1) depending on whether the observation for firmage0
is missing or not, respectively. Then sum
does the cumulative sum of that.
If the variable is in fact of string type, then replace the corresponding line with:
replace counter = "" if missing(firmage0) // optional
A one-liner (for numeric type) is:
bysort year: gen counter = cond(missing(firmage0), ., sum(!missing(firmage0)))
but I prefer the first way because it is easier to read.
Upvotes: 3
Reputation: 35
I believe that if you sort by year
and firmage0
before:
. sort year firmage0
you will have your problem solved. I changed the year of one of the firmage0
values just to confirm that it would restart the count.
. by year : gen counter = _n if firmage0 !=.
year firmage0 counter
1975 21 1
1975 60 2
1975
1975
1975
1975
1975
1975
1975
1975
1975
1975
1975
1975
1975
1975
1975
1975
1975
1975
1975
1975
1976 115 1
Upvotes: 1