Reputation: 7517
I was wondering if there is a way to subset one of each rows with a unique sch.id
in my data
below (e.g., the first row of each sch.id
)?
Since there are 160 unique sch.id
, I expect 160 rows in the final output.
library(tidyverse)
hsb <- read.csv('https://raw.githubusercontent.com/rnorouzian/e/master/hsb.csv')
data <- hsb %>% group_by(sch.id) %>% mutate(math_ave = mean(math))
Upvotes: 2
Views: 917
Reputation: 389235
data.table
approach could be :
library(data.table)
setDT(data)[, .SD[1], sch.id]
# sch.id math size sector meanses minority female ses math_ave
# 1: 1224 5.876 842 0 -0.428 0 1 -1.528 9.715447
# 2: 1288 7.857 1855 0 0.128 0 1 -0.788 13.510800
# 3: 1296 12.668 1719 0 -0.420 1 1 -0.148 7.635958
# 4: 1308 13.233 716 1 0.534 0 0 0.422 16.255500
# 5: 1317 12.862 455 1 0.351 0 1 0.882 13.177687
# ---
#156: 9359 19.797 1184 1 0.360 0 0 0.612 15.270623
#157: 9397 5.873 1314 0 0.140 0 0 0.502 10.355468
#158: 9508 13.932 1119 1 -0.132 0 0 0.242 13.574657
#159: 9550 1.766 1532 0 0.059 1 0 -0.228 11.089138
#160: 9586 14.076 262 1 0.627 0 1 0.852 14.863695
Upvotes: 1
Reputation: 887851
If we need all the variables, an option is to use distinct
after the mutate
so that it will keep the first row per each 'sch.id'
library(dplyr)
hsb %>%
group_by(sch.id) %>%
mutate(math_ave = mean(math)) %>%
ungroup %>%
distinct(sch.id, .keep_all = TRUE)
# A tibble: 160 x 9
# sch.id math size sector meanses minority female ses math_ave
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1224 5.88 842 0 -0.428 0 1 -1.53 9.72
# 2 1288 7.86 1855 0 0.128 0 1 -0.788 13.5
# 3 1296 12.7 1719 0 -0.42 1 1 -0.148 7.64
# 4 1308 13.2 716 1 0.534 0 0 0.422 16.3
# 5 1317 12.9 455 1 0.351 0 1 0.882 13.2
# 6 1358 -1.35 1430 0 -0.014 1 0 0.032 11.2
# 7 1374 16.7 2400 0 -0.007 0 0 0.322 9.73
# 8 1433 12.9 899 1 0.718 0 0 0.812 19.7
# 9 1436 24.1 185 1 0.569 0 0 0.222 18.1
#10 1461 13.0 1672 0 0.683 0 1 0.042 16.8
# … with 150 more rows
Or another option without ungroup
ing would be to slice
the first row
hsb %>%
group_by(sch.id) %>%
mutate(math_ave = mean(math)) %>%
slice(1)
Or using base R
with ave
and duplicated
transform(hsb, math_ave = ave(math, sch.id))[!duplicated(hsb$sch.id),]
Upvotes: 1