user2165857
user2165857

Reputation: 2690

Reshape a data frame to long format by expanding elements of an existing column

I have a dataframe with 3 columns:

A <- c("stringA", "stringA", "stringB", "stringB")
B <- c(1, 2, 1, 2)
C <- c("abcd", "abcd", "abcde", "bbc")

df <- data.frame(A, B, C)

> test
        A B     C
1 stringA 1  abcd
2 stringA 2  abcd
3 stringB 1 abcde
4 stringB 2   bbc

I would like to reformat so that column B becomes the row names and the values in column C are split into individual letters to get:

A    1    2   
stringA    a    a
stringA    b    b
stringA    c    c
stringA    d    d
stringB    a    b
stringB    b    b
stringB    c    c
stringB    d    NA
stringB    e    NA

Upvotes: 1

Views: 136

Answers (1)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

Here's an approach using "data.table" and "reshape2". Make sure you're using at least version 1.8.11 of the "data.table" package first.

library(reshape2)
library(data.table)
packageVersion("data.table")
# [1] ‘1.8.11’

DT <- data.table(df, key="A,B")
DT <- DT[, list(C = unlist(strsplit(as.character(C), ""))), by = key(DT)]
DT[, N := sequence(.N), by = key(DT)]
dcast.data.table(DT, A + N ~ B, value.var="C")
#          A N 1  2
# 1: stringA 1 a  a
# 2: stringA 2 b  b
# 3: stringA 3 c  c
# 4: stringA 4 d  d
# 5: stringB 1 a  b
# 6: stringB 2 b  b
# 7: stringB 3 c  c
# 8: stringB 4 d NA
# 9: stringB 5 e NA

If you prefer sticking with base R, the approach is somewhat similar:

## Split the "C" column up
X <- strsplit(as.character(df$C), "")

## "Expand" your data.frame
df2 <- df[rep(seq_along(X), sapply(X, length)), ]

## Create an additional "id"
df2$id <- with(df2, ave(as.character(A), A, B, FUN = seq_along))

## Replace your "C" values
df2$C <- unlist(X)

## Reshape your data
reshape(df2, direction = "wide", idvar=c("A", "id"), timevar="B")
#           A id C.1  C.2
# 1   stringA  1   a    a
# 1.1 stringA  2   b    b
# 1.2 stringA  3   c    c
# 1.3 stringA  4   d    d
# 3   stringB  1   a    b
# 3.1 stringB  2   b    b
# 3.2 stringB  3   c    c
# 3.3 stringB  4   d <NA>
# 3.4 stringB  5   e <NA>

Upvotes: 3

Related Questions