Reputation: 1238
library(quanteda)
library(tidyr)
df <- data.frame(id = c(1,2), text = c("I am loving it", "I am hating it but I go, and I teach"), stringsAsFactors = FALSE)
myDfm <- df$text %>%
tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE) %>%
tokens_remove(pattern = c(stopwords(source = "smart")))
How is it possible to unlist the data frame and have this format
data.frame(id = c(1,2), text = c("loving", "hating teach")
I tried to unlist it using this:
unlist(myDfm$text[1:length(myDfm)])
Upvotes: 1
Views: 765
Reputation: 14902
Here's how:
data.frame(
id = seq_along(myDfm),
text = sapply(myDfm, paste, collapse = " "),
row.names = NULL
)
## id text
## 1 1 loving
## 2 2 hating teach
Note that your myDfm
is a tokens object, not a dfm.
Upvotes: 0
Reputation: 10845
The text data can be extracted as follows.
library(quanteda)
library(tidyr)
df <- data.frame(id = c(1,2), text = c("I am loving it", "I am hating it but I go"), stringsAsFactors = FALSE)
myDfm <- df$text %>%
tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE) %>%
tokens_remove(pattern = c(stopwords(source = "smart")))
data.frame(id = 1:length(myDfm),text = unlist(myDfm))
...and the output:
> data.frame(id = 1:length(myDfm),text = unlist(myDfm))
id text
text1 1 loving
text2 2 hating
>
Upvotes: 1