Reputation: 4970
This is an example of what I want to achieve. I'm using a for loop in R. However, I want to move away from that as it is too slow on large data. What is a better approach that still works fast when the output is a much larger dataframe (e.g., >1000 columns and rows)?
df <- data.frame(id=c('a', 'a', 'b', 'c', 'c', 'c'), code=c(1,2,3,3,1,2), stringsAsFactors = F)
uid <- unique(df$id)
out <- NULL
df
id code
1 a 1
2 a 2
3 b 3
4 c 3
5 c 1
6 c 2
for (i in uid){
z <- t(df[df$id==i,])
colnames(z) <- z[2,]
z <- as.data.frame(z[2, , drop = FALSE])
out <- bind_rows(out, z)
}
out
1 2 3
1 1 2 <NA>
2 <NA> <NA> 3
3 1 2 3
Upvotes: 1
Views: 47
Reputation: 388982
We can use complete
and spread
library(dplyr)
library(tidyr)
df %>%
mutate(code1 = code) %>%
complete(id, code) %>%
spread(code, code1)
# A tibble: 3 x 4
# id `1` `2` `3`
# <chr> <dbl> <dbl> <dbl>
#1 a 1 2 NA
#2 b NA NA 3
#3 c 1 2 3
Upvotes: 2