Linda Wang
Linda Wang

Reputation: 31

Calculate the number of firms at a given month

I'm working on a dataset in Stata The first column is the name of the firm. the second column is the start date of this firm and the third column is the expiration date of this firm. If the expdate is missing, this firm is still in business. I want to create a variable that will record the number of firms at a given time. (preferably to be a monthly variable)

I'm really lost here. Please help!

Upvotes: 1

Views: 417

Answers (1)

Arthur Morris
Arthur Morris

Reputation: 1348

Next time, try using dataex (ssc install dataex) rather than a screen shot, this is recommended in the Stata tag wiki, and will help others help you!

Here is an example for how to count the number of firms that are alive in each period (I'll use years, but point out where you can switch to month). This example borrows from Nick Cox's Stata journal article on this topic.

First, load the data:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long(firmID dt_start dt_end)
3923155 20080123 99991231
2913168 20070630 99991231
3079566 20000601 20030212
3103920 20020805 20070422
3357723 20041201 20170407
4536020 20120201 20170407
2365954 20070630 20190630
4334271 20110721 20191130
4334338 20110721 20170829
4334431 20110721 20190429
end

Note that my in my example data my dates are not in Stata format, so I'll convert them here:

tostring dt_start, replace
generate startdate=date(dt_start, "YMD") 
tostring dt_end, replace
generate enddate=date(dt_end, "YMD") 
format startdate enddate

Next make a variable with the time interval you'd like to count within:

generate startyear = year(startdate)
generate endyear = year(enddate)

In my dataset I have missing end dates that begin with '9999' while you have them as '.' I'll set these to the current year, the assumption being that the dataset is current. You'll have to decide whether this is appropriate in your data.

replace endyear = year(date("$S_DATE","DMY")) if endyear == 9999

Next create an observation for the first and last years (or months) that the firm is alive:

expand 2
by firmID, sort: generate year = cond(_n == 1, startyear, endyear)
keep firmID year
duplicates drop // keeps one observation for firms that die in the period they were born

Now expand the dataset to have an observation for every period between the start and end date. For this I use tsfill.

xtset firmID year
tsfill

Now I have one observation per existing firm in each period. All that remains is to count the observations by year:

egen entities = count(firmID), by(year)
drop firmID
duplicates drop

Upvotes: 1

Related Questions