Paco
Paco

Reputation: 93

Check if for each unique value of var1, there is one observation where its value equals either var2 or var3 by group (var4) in R

I have a not complicated problem, I think, but my knowledge of R is pretty basic and so I can't find an answer. I have 4 variables. One is a grouping variable I call cluster. The other 3 (ID, IDman, IDwoman) are IDs of individuals. Something like this:

cluster <- c("a", "a", "a", "b", "b", "b", "c", "c", "c")

ID <- c(1, 7, 18, 3, 3, 9, 25, 10, 19)

IDman <- c(1, 2, 3, 3, 3, 4, 10, 10, 6)

IDwoman <- c(5, 7, 9, 11, 12, 14, 19,19,5)

households <- data.frame(cluster, ID, IDman, IDwoman)

The dataframe (household) is basically showing the individuals (ID) that are in a household (cluster). Sometimes, those individuals are a marriage, and this information is given by a certain combination of IDman and IDwoman: it happens when ID equals IDman and ID equals IDwoman within the same cluster. For example, for the first cluster (cluster=a, or first 3 rows) there is a marriage. IDman=1 and IDwoman=7 are a marriage because they are in the same household (cluster=a) and because ID and IDman equal 1 in the first row, but also ID and IDwoman equal 7 in the second (all of it happening within cluster a).

So, what I need is to find the number of unique combinations for each cluster of ID-equals-IDman and ID-equals-IDwoman. For instance,in the second cluster, we have none (as there is no IDwoman=9), and in the third cluster we have again one, as IDman=10 and IDwoman=19 appear both in ID, and the repetition of the observation IDman=10 and IDwoman=19 is not taken into account. The outcome doesn't need to be dataset showing these links. Just the number of these unique combinations per cluster.

I don't know how to solve this. I was trying things through apply or sapply functions, but none worked.

Any idea is very welcome.

Thank you!

Upvotes: 1

Views: 93

Answers (1)

Parfait
Parfait

Reputation: 107767

Consider assigning marriage column with ave (in-line aggregation by groups) where max is used to return any TRUE values.

households <- within(households, {    
    man <- ave(IDman %in% ID, cluster, FUN=max)
    woman <- ave(IDwoman %in% ID, cluster, FUN=max)
    marriage <- man == 1 & woman == 1

    rm(man, woman)    
})

households
#   cluster ID IDman IDwoman marriage
# 1       a  1     1       5     TRUE
# 2       a  7     2       7     TRUE
# 3       a 18     3       9     TRUE
# 4       b  3     3      11    FALSE
# 5       b  3     3      12    FALSE
# 6       b  9     4      14    FALSE
# 7       c 10    10      19     TRUE
# 8       c 19     6       5     TRUE
# 9       c 25    10      19     TRUE

And for unique combinations, filter data frame accordingly by rows and columns, then run unique:

unique(households[households$marriage == TRUE,
                  c("cluster", "marriage")])

#   cluster marriage
# 1       a     TRUE
# 7       c     TRUE

Upvotes: 1

Related Questions