Reputation: 3635
I have a large data.table
where one column contains text, here is a simple example:
x = data.table(text = c("This is the first text", "Second text"))
I would like to get a data.table
with one column containing all the words of all the texts. Here was my try:
x[, strsplit(text, " ")]
text
1: This is the first text
2: Second text
Which results in:
V1 V2
1: This Second
2: is text
3: the Second
4: first text
5: text Second
The result I would like to get is:
text
1: This
2: is
3: the
4: first
5: text
6: Second
7: text
Upvotes: 1
Views: 76
Reputation: 21621
As mentionned by @Henrik in the comments, you could use cSplit
from the splitstackshape
package for this task:
library(splitstackshape)
cSplit(x, "text", sep = " ", direction = "long")
Which gives:
# text
#1: This
#2: is
#3: the
#4: first
#5: text
#6: Second
#7: text
You could also create a column to help identify the initial sentences in the result:
x %>% dplyr::mutate(n = 1:n()) %>% cSplit(., "text", " ", "long")
Which gives:
# text n
#1: This 1
#2: is 1
#3: the 1
#4: first 1
#5: text 1
#6: Second 2
#7: text 2
Upvotes: 2
Reputation: 31171
You are close and looking for:
data.table(text=unlist(strsplit(x$text, " ")))
# text
#1: This
#2: is
#3: the
#4: first
#5: text
#6: Second
#7: text
Upvotes: 3