Reputation: 551
My dataset looks like this:
firm_id year total_workers
1 1975 614
1 1976 68
1 1977 708
1 1978 18
1 1979 536
3 1975 154
3 1976 59
3 1977 115
3 1978 40
3 1979 380
4 1975 49
4 1976 42
4 1977 53
4 1978 54
4 1979 34
5 1975 254
5 1976 1115
and so on...
I'd like Stata to display all firm_id
s and the corresponding year
in which the value of total_workers
is 10 times larger than the value of total_workers
in another year
but within the same firm_id
. For example, concerning firm_id == 1
, Stata should display firm_id ==1
and year == 1977
(it doesn't have to be displayed exactly in this way) since in 1977 total_workers
was more than 10 times larger than total_workers
in 1976.
Since the command display
doesn't seem to work in this context, I used the command tab
which is a more or less unsatisfactory substitute. But nevertheless my overall command doesn't work. It looks like this and is a first try....:
by firm_id: tab firm_id year if total_workers >10*total_workers
As you can see, the if
condition is not specified in the way it should be.
Upvotes: 0
Views: 87
Reputation: 11102
It seems you only want to compare consecutive years (by firm) which can be done like this:
clear all
set more off
input firm_id year total_workers
1 1975 614
1 1976 68
1 1977 708
1 1978 18
1 1979 536
3 1975 154
3 1976 59
3 1977 115
3 1978 40
3 1979 380
4 1975 49
4 1976 42
4 1977 53
4 1978 54
4 1979 34
5 1975 254
5 1976 1115
end
sort firm_id year // important
list, sepby(firm_id)
by firm_id: gen flag = (total_workers[_n] > 10*total_workers[_n-1])
list if flag == 1
The important points are the sort
and the use of subscripting.
To make it one line shorter, you can incorporate the sort
into the main instruction like this:
bysort firm_id (year): gen flag = (total_workers[_n] > 10*total_workers[_n-1])
The reason your code doesn't work is that Stata evaluates it on the dataset line by line (i.e. observation by observation from top to bottom) - I appreciate any correction if there's an error here -. You are asking Stata to tabulate only if one observed instance of the variable is larger than itself multiplied by 10 which is impossible (i.e. always false). See the output of the following:
bysort firm_id (year): gen flag = 1 if total_workers > 10*total_workers
Using subscripts explicitly, the previous line is equivalent to
bysort firm_id (year): gen flag = 1 if total_workers[_n] > 10*total_workers[_n]
Upvotes: 1