Angela Grant
Angela Grant

Reputation: 13

R complicated version of converting dataset to long format (id information is spread across multiple disparate rows)

I have student enrollment data that is currently organized as follows:

df <- tibble(course_number = c("Snow", 12345, 56789, "Stark", 10111, 21314, 15161),
                    academic_level = c("John", "UG", "UG", "Arya", "GR", "GR", "GR"),
                    course_id = c("middlename", "Wall101", "Wall102", "middlename", "Assassin501", "Assassin502", "Assassin503"))

My actual datasets have thousands of students, as well as a variety of more columns of course information, but the main problem I'm having is converting those rows of names into a new column with repeating name values based on the number of courses each student took. I'm familiar with gather and spread, and I have been able to separate out just the name information into its own column (currently saved in a separate df), but I need to find a way to count the courses so I know how many times each name has to repeat.

Thanks in advance!

Upvotes: 1

Views: 30

Answers (1)

akrun
akrun

Reputation: 887118

May be we can create a grouping variable based on the occurrence of letters in the 'course_number', create a the 'name' based on the first element of 'course_number' and 'academic_level' and remove the first row

library(dplyr)
library(stringr)    
df %>% 
  group_by(grp = cumsum(str_detect(course_number, '[A-Za-z]'))) %>% 
  mutate(name = str_c(first(course_number), course_id,  
                 first(academic_level), sep=" ")) %>%
   slice(-1)

Upvotes: 1

Related Questions