Laura K
Laura K

Reputation: 207

Double match in r

I have a huge data set in r with one row per individual. One of my columns shows a family identifier (note, sex==1, male, sex==2, female).

ind sex income  hw  family.id
1   1    10     6    fam.1
2   2    8      7    fam.1
3   2    15     8    fam.2
4   1    7      4    fam.3
5   2    9      5    fam.3

How can I do a "double matching" so I can match couples in the data set for many of the variables that I am interested? For example, let's say individual 2, female, married with individual 1, male, should receive an entry in a new column with his income (same goes for hw):

ind sex income  hw  family.id  income.male   hw.male
1   1    10     6    fam.1       10            6
2   2    8      7    fam.1       8             6
3   2    15     8    fam.2       -             -
4   1    7      4    fam.3       7             7  
5   2    9      5    fam.3       9             7

I've said "double matching" in the title because I don't need to match only the family.ID, but I need to find a male that matches this fam.id. The reason I am doing this is because later all males will be dropped from the data set and I will remain only with rows for females.

I am sorry I can't show any coding I've worked. I've tried many approaches using match, ifelse, lapply and even unlist but it is not worth to add it here as unfortunately I can't make it work.

Anyone has a clue? We can work with both data.frames or data.tables environments.

Upvotes: 0

Views: 938

Answers (3)

Arun
Arun

Reputation: 118779

To follow up on my comment:

require(data.table)
dt[dt[sex == 1L], c("i.m", "hw.m") := .(i.income, i.hw), on="family.id"][]

Extract the row indices where sex == 'male' for each family.id and add two columns by reference with the corresponding income and hw values.


where dt is:

dt = fread('ind sex income  hw  family.id
1   1    10     6    fam.1
2   2    8      7    fam.1
3   2    15     8    fam.2
4   1    7      4    fam.3
5   2    9      5    fam.3')

Upvotes: 2

danas.zuokas
danas.zuokas

Reputation: 4643

You should go with data.table package. Here is an example:

library(data.table)

dt <- data.table(ind = c(1, 2, 3, 4, 5), sex =c(1, 2, 2, 1, 2), income = c(10, 8, 15, 7, 9), hw = c(6, 7, 8, 4, 5), family.id = c('fam.1', 'fam.1', 'fam.2', 'fam.3', 'fam.3'))
setkeyv(dt, 'family.id')

dt2 <- dt[dt[sex == 1, list(family.id, income, hw)]]

It will take income and hw of males (dt[sex == 1, list(family.id, income, hw)]) and match all individuals on family.id. As a result you obtain:

   ind sex income hw family.id i.income i.hw
1:   1   1     10  6     fam.1       10    6
2:   2   2      8  7     fam.1       10    6
3:   4   1      7  4     fam.3        7    4
4:   5   2      9  5     fam.3        7    4

columns with prefix i. containing values of males for every family. Note that if no male is present you will not receive any row. If you still need this you can do:

dt2 <- merge(dt, dt[sex == 1, list(family.id, income, hw)], by = 'family.id', suffixes = c('', '.i'), all = TRUE)

to receive

   family.id ind sex income hw income.i hw.i
1:     fam.1   1   1     10  6       10    6
2:     fam.1   2   2      8  7       10    6
3:     fam.2   3   2     15  8       NA   NA
4:     fam.3   4   1      7  4        7    4
5:     fam.3   5   2      9  5        7    4

Later when you need to drop male data you do:

dt2[sex == 2]

Upvotes: 2

IRTFM
IRTFM

Reputation: 263301

Let's assume that the dataframe is named 'dat'. You can merge the males and females by family.id with the merge function. You proposed answeer didn't make sense to me or to the otehr commenters but you can reassign "income" or "hw" within this new object.

> merge( dat[ dat$sex==1, ], dat[dat$sex==2,] , by="family.id")
  family.id ind.x sex.x income.x hw.x ind.y sex.y income.y hw.y
1     fam.1     1     1       10    6     2     2        8    7
2     fam.3     4     1        7    4     5     2        9    5

Upvotes: 2

Related Questions