Reputation: 7
I am trying to build a recommender system to recommend electives to the new students based on their core courses and historical students' data (data contain both core courses and electives).
Any help is greatly appreciated. Thanks in advance!
Upvotes: 0
Views: 75
Reputation: 4378
Using reshape2 and %>% operator from dplyr
df <- read.csv(text="
Student_Num,1000,1100,2000,2200,4100,4200
1,A,B-,,B,,C
2,,B,A,,,
3,,,C,,E,
", stringsAsFactors = FALSE)
library(reshape2)
library(dplyr)
melt(df, id.vars = "Student_Num", value.name = 'Grade') %>%
mutate(variable = substr(variable, 2, 5)) %>%
filter(Grade != "") %>%
group_by(Student_Num) %>%
summarize(Sequence = paste0(variable, ":", Grade, collapse = ","))
# Student_Num Sequence
# <int> <chr>
# 1 1 1000:A,1100:B-,2200:B,4200:C
# 2 2 1100:B,2000:A
# 3 3 2000:C,4100:E
Upvotes: 0
Reputation: 4551
Using the tidyverse
suite of packages:
library(tidyverse)
# The pipe operator (%>%) makes df1 the first argument of the next function.
# It lets us look at this "in order" not nested
df1 <- data_frame(
term_code = c(200701, 200701, 200707, 200701, 200801, 200807, 200707, 200701),
student_number = rep(1:3, c(4, 2, 2)),
course_number = c(1000, 2200, 1100, 4200, 2000, 1100, 2000, 4100),
grade = c('A','B', 'B-','C','A', 'B','C','E')
)
df1 %>%
unite(Sequence,c(course_number, grade), sep = ":") %>%
group_by(student_number) %>%
summarize(
Sequence = paste(Sequence, collapse = ", ")
)
If you aren't familiar with the pipe operator or the other functions I'm using, I would call this one piece at a time so you can see what it's doing (and it's all documented at https://www.tidyverse.org/). For example,
df1 %>%
unite(Sequence,c(course_number, grade), sep = ":")
Upvotes: 1
Reputation: 28695
It would probably be easier to start from Table 1 (df1
in example below)
require(dplyr)
set.seed(46)
df1 <- data.frame(Term_Code = sample(2001:2003, 7, T),
Student_Num = sample(1:3, 7, T),
Course_Num = sample(1000:1003, 7, T),
Grade = sample(LETTERS[1:4], 7, T), stringsAsFactors = F)
# A tibble: 7 x 5
# Groups: Student_Num [3]
# Term_Code Student_Num Course_Num Grade Sequence
# <int> <int> <int> <chr> <chr>
#1 2001 2 1003 A 1003:A
#2 2001 3 1002 D 1002:D
#3 2002 3 1003 A 1003:A
#4 2002 1 1000 A 1000:A
#5 2001 1 1002 B 1002:B
#6 2002 2 1002 B 1002:B
#7 2003 1 1003 A 1003:A
df1 %>%
group_by(Student_Num) %>%
summarise(Sequence = paste(Course_Num, Grade, sep = ':', collapse = ', '))
# A tibble: 3 x 2
# Student_Num Sequence
# <int> <chr>
#1 1 1000:A, 1002:B, 1003:A
#2 2 1003:A, 1002:B
#3 3 1002:D, 1003:A
Upvotes: 1