Reputation: 1
Say I have a table like this:
Students | Equipment # |
---|---|
A | 101 |
A | 102 |
A | 103 |
B | 104 |
B | 105 |
B | 106 |
B | 107 |
B | 108 |
C | 109 |
C | 110 |
C | 111 |
C | 112 |
I want to grab equipment # samples from each student in the data frame with varying sample sizes.
For example, I want 1 equipment # from student "A", 2 from student "B", and 3 from student "C". How can I achieve this in R?
This is the code that I have now, but I'm only getting 1 equipment # printed from each student.
students <- unique(df$`Students`)
sample_size <- c(1,2,3)
for (i in students){
s <- sample(df[df$`Students` == i,]$`Equipment #`, size = sample_size, replace = FALSE)
print(s)
}
Upvotes: 0
Views: 603
Reputation: 389215
You can create a dataframe which has information students and the rows to be sampled. Join the data and use sample_n
to sample those rows.
library(dplyr)
sample_data <- data.frame(Students = c('A', 'B', 'C'), nr = 1:3)
df %>%
left_join(sample_data, by = 'Students') %>%
group_by(Students) %>%
sample_n(first(nr)) %>%
ungroup() %>%
select(-nr) -> s
s
# Students Equipment
# <chr> <int>
#1 A 102
#2 B 108
#3 B 105
#4 C 110
#5 C 112
#6 C 111
Upvotes: 1
Reputation: 5254
You're close. You need to index the sample_size
vector with the loop, otherwise it will just take the first item in the vector for each iteration.
library(dplyr)
# set up data
df <- data.frame(Students = c(rep("A", 3),
rep("B", 5),
rep("C", 4)),
Equipment_num = 101:112)
# create vector of students
students <- df %>%
pull(Students) %>%
unique()
# sample and print
for (i in seq_along(students)) {
p <- df %>%
filter(Students == students[i]) %>%
slice_sample(n = i)
print(p)
}
#> Students Equipment_num
#> 1 A 102
#> Students Equipment_num
#> 1 B 107
#> 2 B 105
#> Students Equipment_num
#> 1 C 109
#> 2 C 110
#> 3 C 112
Created on 2021-08-06 by the reprex package (v2.0.0)
Actually this is a much more elegant and generalizable way to tackle this problem.
Upvotes: 0