M.C. Park
M.C. Park

Reputation: 305

How to remove duplicated rows using two columns

I have a data set like below

> set.seed(1)
> tmp <- data.frame(household.id = rep(1000, 6), individual.id = rep(1:3, each = 2),
                 age = rep(c(55, 52, 27), each = 2),
                 income = runif(6)*100)

> tmp
  household.id individual.id age   income
1         1000             1  55 26.55087
2         1000             1  55 37.21239
3         1000             2  52 57.28534
4         1000             2  52 90.82078
5         1000             3  27 20.16819
6         1000             3  27 89.83897

That is, the individual "1" is the father of the household "1000", "2" is the mother, and "3" is the daughter. In this case, I want to use only column 1, 3, and 5.

(i.e. I want to remove one of the duplicated rows using household.id and individual.id)

Also, I want make the mother's age, father's age, and daughter's age variables after the above work. How can I do this?

Upvotes: 0

Views: 40

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388862

Do you need something like this ?

library(dplyr)
library(tidyr)

tmp %>%
  mutate(relation = recode(individual.id,  `1` = 'father', 
                           `2` = 'mother', `3` = 'daughter' )) %>%
  pivot_wider(names_from = relation, values_from = age, 
              id_cols =  household.id, values_fn = first)


#  household.id father mother daughter
#         <dbl>  <dbl>  <dbl>    <dbl>
#1         1000     55     52       27

Upvotes: 3

Related Questions