Reputation: 857
The aim is to get the highest educational value between two partners in a household by disregarding the educational level of the children. The first column hhid
is the household number id, the second column is the individuals id
. The third column relation
is the relationship between the individuals in a household: 1
refers to the head of the household, 2
refers to the partner and 3
refers to the children. The fourth column refers to the educational level of these individuals.
The fifth column is the column i would like to get using a code. The aim is to only focus on the highest educational level within a household but only between parents. I usually use pmax
to get the highest value between two columns, and group_by
to aggregate individuals under a group like a household, but these two commands to not seem to work in this case. Could someone help please?
hhid id relation education highest_education
1 1 1 3 3
1 2 2 2 3
1 3 3 5 3
2 4 1 4 4
2 5 2 2 4
3 6 1 1 2
3 7 2 2 2
4 8 1 1 3
4 9 2 3 3
4 10 3 4 3
Here is the data:
structure(list(hhid = c(1, 1, 1, 2, 2, 3, 3, 4, 4, 4), id = c(1,
2, 3, 4, 5, 6, 7, 8, 9, 10), relation = c(1, 2, 3, 1, 2, 1, 2,
1, 2, 3), education = c(3, 2, 5, 4, 2, 1, 2, 1, 3, 4)), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
Upvotes: 0
Views: 23
Reputation: 123903
You could do:
library(dplyr)
df %>%
group_by(hhid) %>%
mutate(highest_education = max(education[relation %in% c(1, 2)])) %>%
ungroup()
#> # A tibble: 10 × 5
#> hhid id relation education highest_education
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 1 3 3
#> 2 1 2 2 2 3
#> 3 1 3 3 5 3
#> 4 2 4 1 4 4
#> 5 2 5 2 2 4
#> 6 3 6 1 1 2
#> 7 3 7 2 2 2
#> 8 4 8 1 1 3
#> 9 4 9 2 3 3
#> 10 4 10 3 4 3
Upvotes: 1