Reputation: 305
I have a data set like below
> set.seed(1)
> tmp <- data.frame(household.id = rep(1000, 6), individual.id = rep(1:3, each = 2),
age = rep(c(55, 52, 27), each = 2),
income = runif(6)*100)
> tmp
household.id individual.id age income
1 1000 1 55 26.55087
2 1000 1 55 37.21239
3 1000 2 52 57.28534
4 1000 2 52 90.82078
5 1000 3 27 20.16819
6 1000 3 27 89.83897
That is, the individual "1" is the father of the household "1000", "2" is the mother, and "3" is the daughter. In this case, I want to use only column 1, 3, and 5.
(i.e. I want to remove one of the duplicated rows using household.id and individual.id)
Also, I want make the mother's age, father's age, and daughter's age variables after the above work. How can I do this?
Upvotes: 0
Views: 40
Reputation: 388862
Do you need something like this ?
library(dplyr)
library(tidyr)
tmp %>%
mutate(relation = recode(individual.id, `1` = 'father',
`2` = 'mother', `3` = 'daughter' )) %>%
pivot_wider(names_from = relation, values_from = age,
id_cols = household.id, values_fn = first)
# household.id father mother daughter
# <dbl> <dbl> <dbl> <dbl>
#1 1000 55 52 27
Upvotes: 3