Reputation: 561
I want to fill my Panel dataset with rows based on the existing range of the grouping variables.
To create a better understanding I will use an example dataset:
> df<-data.frame(Student=c(1, 1, 2), Year=c(1,2,2), Type=c("Test","Exam","Test"),Points=c(80,140,30))
> df
Student Year Type Points
1 1 1 Test 80
2 1 2 Exam 140
3 2 2 Test 30
What I want to have is for every Student for every year two obervations based on the range of points. It should look like this after the transformation:
> df2<-data.frame(Student=c(1, 1, 1,1,2,2,2,2), Year=c(1,1,2,2,1,1,2,2), PointRange=c("0_100","100_200","0_100","100_200","0_100","100_200","0_100","100_200"), n_tests=c(1,0,0,0,0,0,1,0), n_exams=c(0,0,0,1,0,0,0,0))
> df2
Student Year PointRange n_tests n_exams
1 1 1 0_100 1 0
2 1 1 100_200 0 0
3 1 2 0_100 0 0
4 1 2 100_200 0 1
5 2 1 0_100 0 0
6 2 1 100_200 0 0
7 2 2 0_100 1 0
8 2 2 100_200 0 0
I've tried the following with dplyr-package:
df %>% mutate(PointRange = case_when(Points >= 0 & Points <= 100 ~ 1, Points >= 101 & Points <= 200 ~ 2)) %>%
+ group_by(Student, Year, PointRange) %>%
+ summarise(n_tests = sum(Type == "Test"),
+ n_exams = sum(Type=="Exam"))
# A tibble: 3 x 5
# Groups: Student, Year [?]
Student Year PointRange n_tests n_exams
<dbl> <dbl> <dbl> <int> <int>
1 1 1 1 1 0
2 1 2 2 0 1
3 2 2 1 1 0
What's missing then are five rows so that for every user I have two Point-Ranges for every year. How can I fix that?
Upvotes: 0
Views: 162
Reputation: 11140
You can use cut
to create ranges and then dplyr::complete()
to create all combinations of student, year, and range -
result <- df %>%
mutate(PointRange = cut(Points, breaks = c(0, 100, 200), right= F)) %>%
complete(Student, Year, PointRange) %>%
group_by(Student, Year, PointRange) %>%
summarize(
n_tests = sum(Type == "Test", na.rm = T),
n_exams = sum(Type == "Exam", na.rm = T)
)
# A tibble: 8 x 5
# Groups: Student, Year [?]
Student Year PointRange n_tests n_exams
<dbl> <dbl> <fct> <int> <int>
1 1.00 1.00 [0,100) 1 0
2 1.00 1.00 [100,200) 0 0
3 1.00 2.00 [0,100) 0 0
4 1.00 2.00 [100,200) 0 1
5 2.00 1.00 [0,100) 0 0
6 2.00 1.00 [100,200) 0 0
7 2.00 2.00 [0,100) 1 0
8 2.00 2.00 [100,200) 0 0
Upvotes: 1