How to get the highest value in a column depending on three other columns?

Question

The aim is to get the highest educational value between two partners in a household by disregarding the educational level of the children. The first column hhid is the household number id, the second column is the individuals id. The third column relation is the relationship between the individuals in a household: 1 refers to the head of the household, 2 refers to the partner and 3 refers to the children. The fourth column refers to the educational level of these individuals.

The fifth column is the column i would like to get using a code. The aim is to only focus on the highest educational level within a household but only between parents. I usually use pmax to get the highest value between two columns, and group_by to aggregate individuals under a group like a household, but these two commands to not seem to work in this case. Could someone help please?

 hhid id     relation    education    highest_education
    
  1     1        1         3                 3
  1     2        2         2                 3
  1     3        3         5                 3
  2     4        1         4                 4
  2     5        2         2                 4
  3     6        1         1                 2
  3     7        2         2                 2
  4     8        1         1                 3
  4     9        2         3                 3
  4    10        3         4                 3

Here is the data:

structure(list(hhid = c(1, 1, 1, 2, 2, 3, 3, 4, 4, 4), id = c(1, 
2, 3, 4, 5, 6, 7, 8, 9, 10), relation = c(1, 2, 3, 1, 2, 1, 2, 
1, 2, 3), education = c(3, 2, 5, 4, 2, 1, 2, 1, 3, 4)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

stefan · Accepted Answer

You could do:

library(dplyr)

df %>% 
  group_by(hhid) %>% 
  mutate(highest_education = max(education[relation %in% c(1, 2)])) %>% 
  ungroup()
#> # A tibble: 10 × 5
#>     hhid    id relation education highest_education
#>                           
#>  1     1     1        1         3                 3
#>  2     1     2        2         2                 3
#>  3     1     3        3         5                 3
#>  4     2     4        1         4                 4
#>  5     2     5        2         2                 4
#>  6     3     6        1         1                 2
#>  7     3     7        2         2                 2
#>  8     4     8        1         1                 3
#>  9     4     9        2         3                 3
#> 10     4    10        3         4                 3

How to get the highest value in a column depending on three other columns?

Answers (1)

Related Questions