Reputation: 63
I'm building a predictive model to predict who of the student will pass the course based on the time they spent on each task in minutes.
Course StudentID Task1 Task2 Task3 ...
AA 3547 2 9 2
AA 3548 5 2 5
AA 3549 1 7 3
AA 3550 2 9 2
AA 3551 5 2 5
AA 3542 2 9 2
BB 3543 5 2 5
BB 3544 1 7 3
BB 3555 2 9 2
CC 3556 5 2 5
However, every task in every course is different e.g. if Task1 in course AA is a question and answer, it might be a video to watch or a wiki to read. I feel feeding the data as it is to the network would be inappropriate and just wondering if there's any way to sort this out. I thought of adding a column next to each task but that would impossible due to the great number of tasks and students. Below is the type of task in each course:
Course Task1 Task2 Task3
AA Q/A Video Wiki
BB Video Wiki Q/A
CC Wiki Wiki Video
Upvotes: 1
Views: 54
Reputation: 239
I would say adding a column to each of the tasks is the clean way to do this. However if you do not want to add one additional column for each of the tasks, then you can have a tuple of each of the tasks.
The tasks tuple would be like (taskId, timeSpent)
Course Student T1 T2
AA 357 (2, 22) (21, 14)
AA 358 (9, 33) (6, 35)
BB 359 (4, 19) (14, 19)
BB 360 (8, 34) (3, 28)
CC 361 (6, 9) (6, 19)
CC 362 (3, 14) (5, 22)
Upvotes: 1