Reputation: 207
I have a huge data set in r with one row per individual. One of my columns shows a family identifier (note, sex==1
, male, sex==2
, female).
ind sex income hw family.id
1 1 10 6 fam.1
2 2 8 7 fam.1
3 2 15 8 fam.2
4 1 7 4 fam.3
5 2 9 5 fam.3
How can I do a "double matching" so I can match couples in the data set for many of the variables that I am interested? For example, let's say individual 2, female, married with individual 1, male, should receive an entry in a new column with his income (same goes for hw):
ind sex income hw family.id income.male hw.male
1 1 10 6 fam.1 10 6
2 2 8 7 fam.1 8 6
3 2 15 8 fam.2 - -
4 1 7 4 fam.3 7 7
5 2 9 5 fam.3 9 7
I've said "double matching" in the title because I don't need to match only the family.ID, but I need to find a male that matches this fam.id
. The reason I am doing this is because later all males will be dropped from the data set and I will remain only with rows for females.
I am sorry I can't show any coding I've worked. I've tried many approaches using match, ifelse
, lapply
and even unlist
but it is not worth to add it here as unfortunately I can't make it work.
Anyone has a clue? We can work with both data.frames
or data.tables
environments.
Upvotes: 0
Views: 938
Reputation: 118779
To follow up on my comment:
require(data.table)
dt[dt[sex == 1L], c("i.m", "hw.m") := .(i.income, i.hw), on="family.id"][]
Extract the row indices where sex == 'male'
for each family.id
and add two columns by reference with the corresponding income
and hw
values.
where dt
is:
dt = fread('ind sex income hw family.id
1 1 10 6 fam.1
2 2 8 7 fam.1
3 2 15 8 fam.2
4 1 7 4 fam.3
5 2 9 5 fam.3')
Upvotes: 2
Reputation: 4643
You should go with data.table
package. Here is an example:
library(data.table)
dt <- data.table(ind = c(1, 2, 3, 4, 5), sex =c(1, 2, 2, 1, 2), income = c(10, 8, 15, 7, 9), hw = c(6, 7, 8, 4, 5), family.id = c('fam.1', 'fam.1', 'fam.2', 'fam.3', 'fam.3'))
setkeyv(dt, 'family.id')
dt2 <- dt[dt[sex == 1, list(family.id, income, hw)]]
It will take income
and hw
of males (dt[sex == 1, list(family.id, income, hw)]
) and match all individuals on family.id
. As a result you obtain:
ind sex income hw family.id i.income i.hw
1: 1 1 10 6 fam.1 10 6
2: 2 2 8 7 fam.1 10 6
3: 4 1 7 4 fam.3 7 4
4: 5 2 9 5 fam.3 7 4
columns with prefix i.
containing values of males for every family. Note that if no male is present you will not receive any row. If you still need this you can do:
dt2 <- merge(dt, dt[sex == 1, list(family.id, income, hw)], by = 'family.id', suffixes = c('', '.i'), all = TRUE)
to receive
family.id ind sex income hw income.i hw.i
1: fam.1 1 1 10 6 10 6
2: fam.1 2 2 8 7 10 6
3: fam.2 3 2 15 8 NA NA
4: fam.3 4 1 7 4 7 4
5: fam.3 5 2 9 5 7 4
Later when you need to drop male data you do:
dt2[sex == 2]
Upvotes: 2
Reputation: 263301
Let's assume that the dataframe is named 'dat'. You can merge the males and females by family.id with the merge function. You proposed answeer didn't make sense to me or to the otehr commenters but you can reassign "income" or "hw" within this new object.
> merge( dat[ dat$sex==1, ], dat[dat$sex==2,] , by="family.id")
family.id ind.x sex.x income.x hw.x ind.y sex.y income.y hw.y
1 fam.1 1 1 10 6 2 2 8 7
2 fam.3 4 1 7 4 5 2 9 5
Upvotes: 2