Reputation: 15718
So I have the data.frame
dat = data.frame(x = c('Sir Lancelot the Brave', 'King Arthur',
'The Black Knight', 'The Rabbit'), stringsAsFactors=F)
> dat
x
1 Sir Lancelot the Brave
2 King Arthur
3 The Black Knight
4 The Rabbit
And I want to transform it into the data frame
> dat2
x 1 2 3 4
1 Sir Lancelot the Brave Sir Lancelot the Brave
2 King Arthur King Arthur
3 The Black Knight The Black Knight
4 The Rabbit The Rabbit
strsplit returns the data as a list
sbt <- strsplit(dat$x, " ")
> sbt
[[1]]
[1] "Sir" "Lancelot" "the" "Brave"
[[2]]
[1] "King" "Arthur"
[[3]]
[1] "The" "Black" "Knight"
[[4]]
[1] "The" "Rabbit"
and as.data.table does not create NULL values where it should, but repeats values
> t(as.data.table(sbt))
[,1] [,2] [,3] [,4]
V1 "Sir" "Lancelot" "the" "Brave"
V2 "King" "Arthur" "King" "Arthur"
V3 "The" "Black" "Knight" "The"
V4 "The" "Rabbit" "The" "Rabbit"
I guess I really would like an argument to as.data.table(x, repeat=FALSE), else how can I accomplish this job?
Upvotes: 10
Views: 7264
Reputation: 4513
Here is a nice and simple approach with tidyr
.
library(tidyr)
ncol <- max(sapply(dat, length))
dat %>%
separate(x, paste0("V", seq(1,ncol)))
Note: You will get a warning, however, it is basically telling you that separate
is padding the data with NA
's. So you can ignore the warning.
Upvotes: 0
Reputation: 193527
This is an old question, I know, but I thought I would share two additional options.
concat.split
from my "splitstackshape" package was designed exactly for this type of thing.
library(splitstackshape)
concat.split(dat, "x", " ")
# x x_1 x_2 x_3 x_4
# 1 Sir Lancelot the Brave Sir Lancelot the Brave
# 2 King Arthur King Arthur
# 3 The Black Knight The Black Knight
# 4 The Rabbit The Rabbit
data.table
has recently (as of version 1.8.11, I believe) had some additions to its arsenal, notably in this case dcast.data.table
. To use it, unlist
the split data (as was done in @mnel's answer), create a "time" variable using .N
(how many new values per row), and use dcast.data.table
to transform the data into the form you are looking for.
library(data.table)
library(reshape2)
packageVersion("data.table")
# [1] ‘1.8.11’
DT <- data.table(dat)
S1 <- DT[, list(X = unlist(strsplit(x, " "))), by = seq_len(nrow(DT))]
S1[, Time := sequence(.N), by = seq_len]
dcast.data.table(S1, seq_len ~ Time, value.var="X")
# seq_len 1 2 3 4
# 1: 1 Sir Lancelot the Brave
# 2: 2 King Arthur NA NA
# 3: 3 The Black Knight NA
# 4: 4 The Rabbit NA NA
Upvotes: 11
Reputation: 59602
sbt = strsplit(dat$x, " ")
sbt
#[[1]]
#[1] "Sir" "Lancelot" "the" "Brave"
#[[2]]
#[1] "King" "Arthur"
#[[3]]
#[1] "The" "Black" "Knight"
#[[4]]
#[1] "The" "Rabbit"
ncol = max(sapply(sbt,length))
ncol
# [1] 4
as.data.table(lapply(1:ncol,function(i)sapply(sbt,"[",i)))
# V1 V2 V3 V4
# 1: Sir Lancelot the Brave
# 2: King Arthur NA NA
# 3: The Black Knight NA
# 4: The Rabbit NA NA
Upvotes: 7
Reputation: 115390
Using data.table
as it appears you are trying to use it.
library(data.table)
DT <- data.table(dat)
DTB <- DT[, list(y = unlist(strsplit(x, ' '))), by = x]
new <- rep(NA_character_, DTB[,.N,by =x][which.max(N), N])
names(new) <- paste0('V', seq_along(new))
DTB[,{.new <- new
.new[seq_len(.N)] <- y
as.list(.new)} ,by= x]
Or using reshape2
dcast
to reshape
library(reshape2)
dcast(DTB[,list(id = seq_len(.N),y),by= x ], x ~id, value.var = 'y')
Upvotes: 2
Reputation: 162321
Here's one option. The single complication is that you need to first convert each vector to a data.frame with one row, as data.frames are what rbind.fill()
expects.
library(plyr)
rbind.fill(lapply(sbt, function(X) data.frame(t(X))))
# X1 X2 X3 X4
# 1 Sir Lancelot the Brave
# 2 King Arthur <NA> <NA>
# 3 The Black Knight <NA>
# 4 The Rabbit <NA> <NA>
My own inclination, though, would be to just use base R, like this:
n <- max(sapply(sbt, length))
l <- lapply(sbt, function(X) c(X, rep(NA, n - length(X))))
data.frame(t(do.call(cbind, l)))
# X1 X2 X3 X4
# 1 Sir Lancelot the Brave
# 2 King Arthur <NA> <NA>
# 3 The Black Knight <NA>
# 4 The Rabbit <NA> <NA>
Upvotes: 10