Separate a column of variable and a column of values into their components in r

Question

I have the following data frame (df) where for each participant, the column "grades" was built by pasting the scores on some items. So for example, the participant '1' has obtained 4 on one item 'a', 7 on item 'b', and 8 on item 'c'. (Note that grades is a character). Note also that participant '2' has only two scores (let say this was a different exam from participant '1').

df = data.frame(participants = c(1, 1, 2),
                variables = c('abc', 'ef', 'abc'),
                grades= c('478', '58', '942'),
                stringsAsFactors = FALSE)


 participants variables grades
1            1       abc    478
2            1        ef     58
3            2       abc    942

(my data contains 100,000 rows like the above.)

I want to transform the data into a tidy style, as the following:

  participants variables grades
1            1         a      4
2            1         b      7
3            1         c      8
4            1         e      5
5            1         f      8
6            2         a      9
7            2         b      4
8            2         c      2

Here what I did

variables =  lapply(X=1:length(df$variables), FUN=function(X) {
  strsplit(df$variables[X], "") %>% .[[1]]}) %>% reduce(c)

grades =  lapply(X=1:length(df$grades), FUN=function(X) {
  strsplit(df$grades[X], "") %>% .[[1]]}) %>% reduce(c)

participants =  lapply(X=1:length(df$participants), FUN=function(X) {
  rep(df$participants[X], nchar(df$variables[X])) })%>% reduce(c)

data.frame(participants, variables, grades)

Yet, it takes several minutes on my machine with my real data, and I feel it is really not efficient because I need 3 different calls.

Any idea for an efficient way to obtain the tidy data would be welcome (I work with tidyr/dplyr)

moodymudskipper · Accepted Answer

You can use strsplit and unnest:

library(tidyverse)
df %>% mutate_at(vars(variables,grades),~strsplit(.,"")) %>% unnest

#   participants variables grades
# 1            1         a      4
# 2            1         b      7
# 3            1         c      8
# 4            1         e      5
# 5            1         f      8
# 6            2         a      9
# 7            2         b      4
# 8            2         c      2

Separate a column of variable and a column of values into their components in r

Answers (2)

Related Questions