Reputation: 93
I have a not complicated problem, I think, but my knowledge of R is pretty basic and so I can't find an answer. I have 4 variables. One is a grouping variable I call cluster
. The other 3 (ID
, IDman
, IDwoman
) are IDs of individuals. Something like this:
cluster <- c("a", "a", "a", "b", "b", "b", "c", "c", "c")
ID <- c(1, 7, 18, 3, 3, 9, 25, 10, 19)
IDman <- c(1, 2, 3, 3, 3, 4, 10, 10, 6)
IDwoman <- c(5, 7, 9, 11, 12, 14, 19,19,5)
households <- data.frame(cluster, ID, IDman, IDwoman)
The dataframe (household
) is basically showing the individuals (ID
) that are in a household (cluster
). Sometimes, those individuals are a marriage, and this information is given by a certain combination of IDman and IDwoman: it happens when ID
equals IDman
and ID
equals IDwoman
within the same cluster. For example, for the first cluster (cluster=a, or first 3 rows) there is a marriage. IDman=1 and IDwoman=7 are a marriage because they are in the same household (cluster=a) and because ID and IDman equal 1 in the first row, but also ID and IDwoman equal 7 in the second (all of it happening within cluster a).
So, what I need is to find the number of unique combinations for each cluster
of ID-equals-IDman and ID-equals-IDwoman. For instance,in the second cluster
, we have none (as there is no IDwoman=9), and in the third cluster
we have again one, as IDman=10 and IDwoman=19 appear both in ID, and the repetition of the observation IDman=10 and IDwoman=19 is not taken into account. The outcome doesn't need to be dataset showing these links. Just the number of these unique combinations per cluster.
I don't know how to solve this. I was trying things through apply
or sapply
functions, but none worked.
Any idea is very welcome.
Thank you!
Upvotes: 1
Views: 93
Reputation: 107767
Consider assigning marriage column with ave
(in-line aggregation by groups) where max
is used to return any TRUE
values.
households <- within(households, {
man <- ave(IDman %in% ID, cluster, FUN=max)
woman <- ave(IDwoman %in% ID, cluster, FUN=max)
marriage <- man == 1 & woman == 1
rm(man, woman)
})
households
# cluster ID IDman IDwoman marriage
# 1 a 1 1 5 TRUE
# 2 a 7 2 7 TRUE
# 3 a 18 3 9 TRUE
# 4 b 3 3 11 FALSE
# 5 b 3 3 12 FALSE
# 6 b 9 4 14 FALSE
# 7 c 10 10 19 TRUE
# 8 c 19 6 5 TRUE
# 9 c 25 10 19 TRUE
And for unique combinations, filter data frame accordingly by rows and columns, then run unique
:
unique(households[households$marriage == TRUE,
c("cluster", "marriage")])
# cluster marriage
# 1 a TRUE
# 7 c TRUE
Upvotes: 1