milan
milan

Reputation: 4970

Combine into dataframe R, allowing NA, not using loop

This is an example of what I want to achieve. I'm using a for loop in R. However, I want to move away from that as it is too slow on large data. What is a better approach that still works fast when the output is a much larger dataframe (e.g., >1000 columns and rows)?

df <- data.frame(id=c('a', 'a', 'b', 'c', 'c', 'c'), code=c(1,2,3,3,1,2), stringsAsFactors = F)
uid <- unique(df$id)
out <- NULL
df

  id code
1  a    1
2  a    2
3  b    3
4  c    3
5  c    1
6  c    2

for (i in uid){
  z <- t(df[df$id==i,])
  colnames(z) <- z[2,]
  z <- as.data.frame(z[2, , drop = FALSE])
  out <- bind_rows(out, z)
}
out  

     1    2    3
1    1    2 <NA>
2 <NA> <NA>    3
3    1    2    3

Upvotes: 1

Views: 47

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388982

We can use complete and spread

library(dplyr)
library(tidyr)

df %>%
  mutate(code1 = code) %>%
  complete(id, code) %>%
  spread(code, code1)

# A tibble: 3 x 4
#  id      `1`   `2`   `3`
#  <chr> <dbl> <dbl> <dbl>
#1 a         1     2    NA
#2 b        NA    NA     3
#3 c         1     2     3

Upvotes: 2

Fino
Fino

Reputation: 1784

Is this fast enough?

library(reshape2)

dcast(df,id~code)

Upvotes: 3

Related Questions