Reputation: 13
I have student enrollment data that is currently organized as follows:
df <- tibble(course_number = c("Snow", 12345, 56789, "Stark", 10111, 21314, 15161),
academic_level = c("John", "UG", "UG", "Arya", "GR", "GR", "GR"),
course_id = c("middlename", "Wall101", "Wall102", "middlename", "Assassin501", "Assassin502", "Assassin503"))
My actual datasets have thousands of students, as well as a variety of more columns of course information, but the main problem I'm having is converting those rows of names into a new column with repeating name values based on the number of courses each student took. I'm familiar with gather and spread, and I have been able to separate out just the name information into its own column (currently saved in a separate df), but I need to find a way to count the courses so I know how many times each name has to repeat.
Thanks in advance!
Upvotes: 1
Views: 30
Reputation: 887118
May be we can create a grouping variable based on the occurrence of letters in the 'course_number', create a the 'name' based on the first
element of 'course_number' and 'academic_level' and remove the first row
library(dplyr)
library(stringr)
df %>%
group_by(grp = cumsum(str_detect(course_number, '[A-Za-z]'))) %>%
mutate(name = str_c(first(course_number), course_id,
first(academic_level), sep=" ")) %>%
slice(-1)
Upvotes: 1