Reputation: 13
An unmarried couple is living together in a house with other people. To isolate how much that couple makes I need to add the two incomes together. I am using variables that act as pointers that give the partners_id
. Using the partners_id
, id
, and individual_income
how do I apply partner's income to his/her partner?
This was my attempt below:
summarize id, meanonly
capture gen partners_income = 0
forvalue ln = 1/`r(max)' {
bys household (id): ///
egen link_`ln' = total(individual_income) if partners_location==`ln')
replace partners_income = link_`ln' if link_`ln' > 0 & id == `ln'
drop link_*
}
Upvotes: 1
Views: 340
Reputation: 37208
There is general advice in this FAQ.
It can take longer to write a smart way to do this than to use a quick-and-dirty approach.
However, there is a smarter way.
Brute solution
Quick here means relatively quick to code; this isn't guaranteed quick for a very large dataset.
gen partners_income = .
gen problem = 0
The proper initialisation of the partner's income variable is to missing, not zero. Not knowing an income and the income being zero are different conditions. For example, if someone doesn't have a partner, the income will certainly be missing. (If at a later stage, you want to treat missings as zeros, that's up to you, but you should keep them distinct at this stage.)
The reason for the problem
variable will become apparent.
I can't see a reason for your capture
.
Now we can loop:
quietly forval i = 1/`=_N' {
su individual_income if id == partners_id[`i'], meanonly
replace partners_income = r(max) in `i'
if r(N) > 1 replace problem = r(N) in `i'
}
So, the logic is
foreach
observation
summarize, meanonly
is fast summarize
as the maximum, minimum, or mean summarize
finds more than one value, something is not as assumed (mistakes over identifiers, or multiple partners); later we edit if problem
and look at those observations. Notes:
We can make comparison safer by restricting computations to the same household by modifying
if id == partners_id[`i']
to
if id == partners_id[`i'] & household == household[`i']
In one place you have the variable partners_location
which looks like a typo for partners_id
.
Cute solution
Assuming that partners name each other as partner (and this is not the forum to explore exceptions), then couples have a joint identity which we obtain by sorting "John Joanna" and "Joanna John" to "Joanna John" or the equivalent with numeric identifiers:
gen first = cond(id < partner_id, id, partner_id)
gen second = cond(id < partner_id, partner_id, id)
egen joint = concat(first second), p(" ")
first
and second
just mean in numeric or alphanumeric order; this works for numeric and string identifiers. You may need to slap on an exclusion clause such as
if !missing(partner_id)
Now
bysort household joint : gen partners_income = income[3 - _n] if _N == 2
Get it? Each distinct combination of household
and joint
should be precisely 2 observations for us to be interested (hence the qualifier if _N == 2
). If that's true then 3 - _n
gives us the subscript of the other partner as if _n
is 1 then 3 - _n
is 2 and vice versa. Under by:
subscripts are always applied within groups, so that _n
runs 1, 2, and so forth in each distinct group.
If this seems cryptic, it is all spelled out in Cox, N.J. 2008. The problem of split identity, or how to group dyads. Stata Journal 8(4): 588-591 which is accessible as a .pdf.
Upvotes: 1