Rtist
Rtist

Reputation: 4205

Separate a column of variable and a column of values into their components in r

I have the following data frame (df) where for each participant, the column "grades" was built by pasting the scores on some items. So for example, the participant '1' has obtained 4 on one item 'a', 7 on item 'b', and 8 on item 'c'. (Note that grades is a character). Note also that participant '2' has only two scores (let say this was a different exam from participant '1').

df = data.frame(participants = c(1, 1, 2),
                variables = c('abc', 'ef', 'abc'),
                grades= c('478', '58', '942'),
                stringsAsFactors = FALSE)


 participants variables grades
1            1       abc    478
2            1        ef     58
3            2       abc    942

(my data contains 100,000 rows like the above.)

I want to transform the data into a tidy style, as the following:

  participants variables grades
1            1         a      4
2            1         b      7
3            1         c      8
4            1         e      5
5            1         f      8
6            2         a      9
7            2         b      4
8            2         c      2

Here what I did

variables =  lapply(X=1:length(df$variables), FUN=function(X) {
  strsplit(df$variables[X], "") %>% .[[1]]}) %>% reduce(c)

grades =  lapply(X=1:length(df$grades), FUN=function(X) {
  strsplit(df$grades[X], "") %>% .[[1]]}) %>% reduce(c)

participants =  lapply(X=1:length(df$participants), FUN=function(X) {
  rep(df$participants[X], nchar(df$variables[X])) })%>% reduce(c)

data.frame(participants, variables, grades)

Yet, it takes several minutes on my machine with my real data, and I feel it is really not efficient because I need 3 different calls.

Any idea for an efficient way to obtain the tidy data would be welcome (I work with tidyr/dplyr)

Upvotes: 2

Views: 34

Answers (2)

Andre Elrico
Andre Elrico

Reputation: 11480

data.table solution

setDT(df)
df[,lapply(.SD,function(x){strsplit(x,"") %>% unlist}),.SDcols=c("variables","grades"),by=participants]

result:

   participants variables grades
1:            1         a      4
2:            1         b      7
3:            1         c      8
4:            1         e      5
5:            1         f      8
6:            2         a      9
7:            2         b      4
8:            2         c      2
> 

Upvotes: 2

moodymudskipper
moodymudskipper

Reputation: 47320

You can use strsplit and unnest:

library(tidyverse)
df %>% mutate_at(vars(variables,grades),~strsplit(.,"")) %>% unnest

#   participants variables grades
# 1            1         a      4
# 2            1         b      7
# 3            1         c      8
# 4            1         e      5
# 5            1         f      8
# 6            2         a      9
# 7            2         b      4
# 8            2         c      2

Upvotes: 3

Related Questions