Extract one row with a unique id variable in R

Question

I was wondering if there is a way to subset one of each rows with a unique sch.id in my data below (e.g., the first row of each sch.id)?

Since there are 160 unique sch.id, I expect 160 rows in the final output.

library(tidyverse)

hsb <- read.csv('https://raw.githubusercontent.com/rnorouzian/e/master/hsb.csv')

data <- hsb %>% group_by(sch.id) %>% mutate(math_ave = mean(math))

akrun · Accepted Answer

If we need all the variables, an option is to use distinct after the mutate so that it will keep the first row per each 'sch.id'

library(dplyr)
hsb %>% 
  group_by(sch.id) %>% 
  mutate(math_ave = mean(math)) %>%
  ungroup %>% 
  distinct(sch.id, .keep_all = TRUE)
# A tibble: 160 x 9
#   sch.id  math  size sector meanses minority female    ses math_ave
#                       
# 1   1224  5.88   842      0  -0.428        0      1 -1.53      9.72
# 2   1288  7.86  1855      0   0.128        0      1 -0.788    13.5 
# 3   1296 12.7   1719      0  -0.42         1      1 -0.148     7.64
# 4   1308 13.2    716      1   0.534        0      0  0.422    16.3 
# 5   1317 12.9    455      1   0.351        0      1  0.882    13.2 
# 6   1358 -1.35  1430      0  -0.014        1      0  0.032    11.2 
# 7   1374 16.7   2400      0  -0.007        0      0  0.322     9.73
# 8   1433 12.9    899      1   0.718        0      0  0.812    19.7 
# 9   1436 24.1    185      1   0.569        0      0  0.222    18.1 
#10   1461 13.0   1672      0   0.683        0      1  0.042    16.8 
# … with 150 more rows

Or another option without ungrouping would be to slice the first row

hsb %>% 
    group_by(sch.id) %>% 
    mutate(math_ave = mean(math)) %>%
    slice(1)

Or using base R with ave and duplicated

transform(hsb, math_ave = ave(math, sch.id))[!duplicated(hsb$sch.id),]

Extract one row with a unique id variable in R

Answers (2)

Related Questions