Nathalie
Nathalie

Reputation: 1238

Unlist all items from quanteda tokens object into data frame

library(quanteda)
library(tidyr)
df <- data.frame(id = c(1,2), text = c("I am loving it", "I am hating it but I go, and I teach"), stringsAsFactors = FALSE)

myDfm <- df$text %>%
  tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE) %>%
  tokens_remove(pattern = c(stopwords(source = "smart")))

How is it possible to unlist the data frame and have this format

data.frame(id = c(1,2), text = c("loving", "hating teach")

I tried to unlist it using this:

unlist(myDfm$text[1:length(myDfm)])

Upvotes: 1

Views: 765

Answers (2)

Ken Benoit
Ken Benoit

Reputation: 14902

Here's how:

data.frame(
  id = seq_along(myDfm),
  text = sapply(myDfm, paste, collapse = " "),
  row.names = NULL
)
##   id         text
## 1  1       loving
## 2  2 hating teach

Note that your myDfm is a tokens object, not a dfm.

Upvotes: 0

Len Greski
Len Greski

Reputation: 10845

The text data can be extracted as follows.

library(quanteda)
library(tidyr)
df <- data.frame(id = c(1,2), text = c("I am loving it", "I am hating it but I go"), stringsAsFactors = FALSE)

myDfm <- df$text %>%
     tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE) %>%
     tokens_remove(pattern = c(stopwords(source = "smart")))

data.frame(id = 1:length(myDfm),text = unlist(myDfm))

...and the output:

> data.frame(id = 1:length(myDfm),text = unlist(myDfm))
      id   text
text1  1 loving
text2  2 hating
> 

Upvotes: 1

Related Questions