Unlist all items from quanteda tokens object into data frame

Question

library(quanteda)
library(tidyr)
df <- data.frame(id = c(1,2), text = c("I am loving it", "I am hating it but I go, and I teach"), stringsAsFactors = FALSE)

myDfm <- df$text %>%
  tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE) %>%
  tokens_remove(pattern = c(stopwords(source = "smart")))

How is it possible to unlist the data frame and have this format

data.frame(id = c(1,2), text = c("loving", "hating teach")

I tried to unlist it using this:

unlist(myDfm$text[1:length(myDfm)])

Len Greski · Accepted Answer

The text data can be extracted as follows.

library(quanteda)
library(tidyr)
df <- data.frame(id = c(1,2), text = c("I am loving it", "I am hating it but I go"), stringsAsFactors = FALSE)

myDfm <- df$text %>%
     tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE) %>%
     tokens_remove(pattern = c(stopwords(source = "smart")))

data.frame(id = 1:length(myDfm),text = unlist(myDfm))

...and the output:

> data.frame(id = 1:length(myDfm),text = unlist(myDfm))
      id   text
text1  1 loving
text2  2 hating
>

Unlist all items from quanteda tokens object into data frame

Answers (2)

Related Questions