emehex
emehex

Reputation: 10538

(Un)tidy a dataset with unequal sizes and duplicate variables

I have a dataset that looks like this:

df <- data.frame(
    x = c(rep("A", 3), rep("B", 2)), 
    y = c(1, 2, 6, 8, 3)
)

I need to (un)tidy it so that it looks like this:

df_new <- data.frame(
    A = c(1, 2, 6),
    B = c(8, 3, NA)
)

tidyr::spread threw duplicate value errors....

Upvotes: 1

Views: 173

Answers (3)

akrun
akrun

Reputation: 887118

We can do this with base R with unstack to create a list, then pad with NA at the end to make the length same for each list element and convert to data.frame

lst <- unstack(df, y~x)
data.frame(lapply(lst, `length<-`, max(lengths(lst))))
#  A  B
#1 1  8
#2 2  3
#3 6 NA

Or if we are using a package, a compact option would be

library(stringi)
stri_list2matrix(split(df$y, df$x))

The output will be string which can be changed to numeric

Upvotes: 1

emehex
emehex

Reputation: 10538

Using dplyr, tidyr::complete, ::spread

df_new <- df %>%
    group_by(x) %>% 
    mutate(index = row_number()) %>% 
    complete(index = 1:max(index)) %>% 
    spread(x, y, fill = NA) %>% 
    select(-index)

Upvotes: 0

Gregor Thomas
Gregor Thomas

Reputation: 145775

tidyr (to my knowledge) won't let you do this without an ID column. So we'll add that first and then spread:

library(dplyr)
library(tidyr)

df %>% group_by(x) %>% 
    mutate(id = 1:n()) %>%
    spread(key = x, value = y, fill = NA)
# # A tibble: 3 x 3
#      id     A     B
# * <int> <dbl> <dbl>
# 1     1     1     8
# 2     2     2     3
# 3     3     6    NA

You can, of course, remove the id column at the end if you prefer.

Upvotes: 3

Related Questions