the_economist
the_economist

Reputation: 551

if condition depending on existence

My dataset looks like this:

firm_id year    total_workers
   1    1975    614
   1    1976    68
   1    1977    708
   1    1978    18
   1    1979    536
   3    1975    154
   3    1976    59
   3    1977    115
   3    1978    40
   3    1979    380
   4    1975    49
   4    1976    42
   4    1977    53
   4    1978    54
   4    1979    34
   5    1975    254
   5    1976    1115

and so on...

I'd like Stata to display all firm_ids and the corresponding year in which the value of total_workers is 10 times larger than the value of total_workers in another year but within the same firm_id. For example, concerning firm_id == 1, Stata should display firm_id ==1 and year == 1977 (it doesn't have to be displayed exactly in this way) since in 1977 total_workers was more than 10 times larger than total_workers in 1976.

Since the command display doesn't seem to work in this context, I used the command tab which is a more or less unsatisfactory substitute. But nevertheless my overall command doesn't work. It looks like this and is a first try....:

  by firm_id: tab firm_id year if total_workers >10*total_workers

As you can see, the if condition is not specified in the way it should be.

Upvotes: 0

Views: 87

Answers (1)

Roberto Ferrer
Roberto Ferrer

Reputation: 11102

It seems you only want to compare consecutive years (by firm) which can be done like this:

clear all
set more off

input firm_id year total_workers
    1 1975 614
    1 1976 68
    1 1977 708
    1 1978 18
    1 1979 536
    3 1975 154
    3 1976 59
    3 1977 115
    3 1978 40
    3 1979 380
    4 1975 49
    4 1976 42
    4 1977 53
    4 1978 54
    4 1979 34
    5 1975 254
    5 1976 1115
end

sort firm_id year // important
list, sepby(firm_id)

by firm_id: gen flag = (total_workers[_n] > 10*total_workers[_n-1])
list if flag == 1 

The important points are the sort and the use of subscripting.

To make it one line shorter, you can incorporate the sort into the main instruction like this:

bysort firm_id (year): gen flag = (total_workers[_n] > 10*total_workers[_n-1])

The reason your code doesn't work is that Stata evaluates it on the dataset line by line (i.e. observation by observation from top to bottom) - I appreciate any correction if there's an error here -. You are asking Stata to tabulate only if one observed instance of the variable is larger than itself multiplied by 10 which is impossible (i.e. always false). See the output of the following:

bysort firm_id (year): gen flag = 1 if total_workers > 10*total_workers

Using subscripts explicitly, the previous line is equivalent to

bysort firm_id (year): gen flag = 1 if total_workers[_n] > 10*total_workers[_n]

Upvotes: 1

Related Questions