Reputation: 4205
I have the following data frame (df) where for each participant, the column "grades" was built by pasting the scores on some items. So for example, the participant '1' has obtained 4 on one item 'a', 7 on item 'b', and 8 on item 'c'. (Note that grades is a character). Note also that participant '2' has only two scores (let say this was a different exam from participant '1').
df = data.frame(participants = c(1, 1, 2),
variables = c('abc', 'ef', 'abc'),
grades= c('478', '58', '942'),
stringsAsFactors = FALSE)
participants variables grades
1 1 abc 478
2 1 ef 58
3 2 abc 942
(my data contains 100,000 rows like the above.)
I want to transform the data into a tidy style, as the following:
participants variables grades
1 1 a 4
2 1 b 7
3 1 c 8
4 1 e 5
5 1 f 8
6 2 a 9
7 2 b 4
8 2 c 2
Here what I did
variables = lapply(X=1:length(df$variables), FUN=function(X) {
strsplit(df$variables[X], "") %>% .[[1]]}) %>% reduce(c)
grades = lapply(X=1:length(df$grades), FUN=function(X) {
strsplit(df$grades[X], "") %>% .[[1]]}) %>% reduce(c)
participants = lapply(X=1:length(df$participants), FUN=function(X) {
rep(df$participants[X], nchar(df$variables[X])) })%>% reduce(c)
data.frame(participants, variables, grades)
Yet, it takes several minutes on my machine with my real data, and I feel it is really not efficient because I need 3 different calls.
Any idea for an efficient way to obtain the tidy data would be welcome (I work with tidyr/dplyr)
Upvotes: 2
Views: 34
Reputation: 11480
data.table solution
setDT(df)
df[,lapply(.SD,function(x){strsplit(x,"") %>% unlist}),.SDcols=c("variables","grades"),by=participants]
result:
participants variables grades
1: 1 a 4
2: 1 b 7
3: 1 c 8
4: 1 e 5
5: 1 f 8
6: 2 a 9
7: 2 b 4
8: 2 c 2
>
Upvotes: 2
Reputation: 47320
You can use strsplit
and unnest
:
library(tidyverse)
df %>% mutate_at(vars(variables,grades),~strsplit(.,"")) %>% unnest
# participants variables grades
# 1 1 a 4
# 2 1 b 7
# 3 1 c 8
# 4 1 e 5
# 5 1 f 8
# 6 2 a 9
# 7 2 b 4
# 8 2 c 2
Upvotes: 3