Reputation: 77
There is a data set of the following structure:
firm_ID partner_ID start_date end_date
1 2 01jan2001 15mar2001
1 3 15feb2001 30apr2001
2 4 20mar2001 14may2001
The data set shows the duration of partnership of firm_ID with partner_ID. Here, firm_ID and partner_ID (both integers) refer to a unique identifier of the firms .
Using Stata, I want to create a summary table, which shows for each unique firm_ID the number of its partnerships, by month and year. If the partnership existed for at least one day in a given month, it should be counted. If firm 1 reports a partnership with firm 2 but firm 2 doesn't report a partnership with firm 1, then it should only be counted for firm 1 and not firm 2.
For the example above, the summary table is:
firm_ID 01/2001 02/2001 03/2001 04/2001 05/2001
1 1 2 2 1 0
2 0 0 1 1 1
The variable firm_ID has 8,000 unique observations and the time span is 01/2001 - 12/2013. Thus, the resulting summary table should have the number of rows equal to 8,000 and the number of columns equal to 12*13=156 (number of months during 01/2001 - 12/2013).
Conceptually, what is the algorithm of solving this task using Stata? Thank you.
Upvotes: 0
Views: 35
Reputation: 37183
You need to work with monthly dates. Note in passing that daily dates presented as you gave them need reverse engineering to be part of calculations. Type ssc inst dataex
for a command to generate reproducible Stata data examples for public forums.
tabulate
works for your toy example; for your real example, you will need a different tabulation to cope with many more rows and columns: see for example groups
from SSC.
clear
input firm_ID partner_ID str9 (s_start_date s_end_date)
1 2 01jan2001 15mar2001
1 3 15feb2001 30apr2001
2 4 20mar2001 14may2001
end
foreach v in start end {
gen `v'_date = daily(s_`v'_date, "DMY")
gen `v'_month = mofd(`v'_date)
format `v'_date %td
format `v'_month %tm
}
gen duration = end_month - start_month + 1
expand duration
bysort firm_ID partner_ID : gen month = start_month + _n - 1
format month %tm
tab firm_ID month
| month
firm_ID | 2001m1 2001m2 2001m3 2001m4 2001m5 | Total
-----------+-------------------------------------------------------+----------
1 | 1 2 2 1 0 | 6
2 | 0 0 1 1 1 | 3
-----------+-------------------------------------------------------+----------
Total | 1 2 3 2 1 | 9
groups firm_ID month, fillin show(f) sepby(firm_ID)
+--------------------------+
| firm_ID month Freq. |
|--------------------------|
| 1 2001m1 1 |
| 1 2001m2 2 |
| 1 2001m3 2 |
| 1 2001m4 1 |
| 1 2001m5 0 |
|--------------------------|
| 2 2001m1 0 |
| 2 2001m2 0 |
| 2 2001m3 1 |
| 2 2001m4 1 |
| 2 2001m5 1 |
+--------------------------+
Upvotes: 1