Reputation: 2690
I have a dataframe with 3 columns:
A <- c("stringA", "stringA", "stringB", "stringB")
B <- c(1, 2, 1, 2)
C <- c("abcd", "abcd", "abcde", "bbc")
df <- data.frame(A, B, C)
> test
A B C
1 stringA 1 abcd
2 stringA 2 abcd
3 stringB 1 abcde
4 stringB 2 bbc
I would like to reformat so that column B becomes the row names and the values in column C are split into individual letters to get:
A 1 2
stringA a a
stringA b b
stringA c c
stringA d d
stringB a b
stringB b b
stringB c c
stringB d NA
stringB e NA
Upvotes: 1
Views: 136
Reputation: 193517
Here's an approach using "data.table" and "reshape2". Make sure you're using at least version 1.8.11 of the "data.table" package first.
library(reshape2)
library(data.table)
packageVersion("data.table")
# [1] ‘1.8.11’
DT <- data.table(df, key="A,B")
DT <- DT[, list(C = unlist(strsplit(as.character(C), ""))), by = key(DT)]
DT[, N := sequence(.N), by = key(DT)]
dcast.data.table(DT, A + N ~ B, value.var="C")
# A N 1 2
# 1: stringA 1 a a
# 2: stringA 2 b b
# 3: stringA 3 c c
# 4: stringA 4 d d
# 5: stringB 1 a b
# 6: stringB 2 b b
# 7: stringB 3 c c
# 8: stringB 4 d NA
# 9: stringB 5 e NA
If you prefer sticking with base R, the approach is somewhat similar:
## Split the "C" column up
X <- strsplit(as.character(df$C), "")
## "Expand" your data.frame
df2 <- df[rep(seq_along(X), sapply(X, length)), ]
## Create an additional "id"
df2$id <- with(df2, ave(as.character(A), A, B, FUN = seq_along))
## Replace your "C" values
df2$C <- unlist(X)
## Reshape your data
reshape(df2, direction = "wide", idvar=c("A", "id"), timevar="B")
# A id C.1 C.2
# 1 stringA 1 a a
# 1.1 stringA 2 b b
# 1.2 stringA 3 c c
# 1.3 stringA 4 d d
# 3 stringB 1 a b
# 3.1 stringB 2 b b
# 3.2 stringB 3 c c
# 3.3 stringB 4 d <NA>
# 3.4 stringB 5 e <NA>
Upvotes: 3