Reputation: 3417
I'm doing text analysis using tidytext
. I am trying to calculate the tf-idf for a corpus. The standard way to do this is:
book_words <- book_words %>%
bind_tf_idf(word, book, n)
However, in my case, the 'document' is not defined by a single column (like book
). Is it possible to call bind_tf_idf where the document is defined by two columns (for example, book
and chapter
)?
Upvotes: 0
Views: 417
Reputation: 54237
Why not concatenate both columns? E.g.
library(tidyverse)
library(tidytext)
library(janeaustenr)
book_words <- austen_books() %>%
unnest_tokens(word, text) %>%
count(book, word, sort = TRUE) %>%
ungroup()
book_words$chapter <- sample(1:10, nrow(book_words), T)
book_words %>%
unite("book_chapter", book, chapter) %>%
bind_tf_idf(word, book_chapter, n) %>% print %>%
separate(book_chapter, c("book", "chapter"), sep="_") %>%
arrange(desc(tf_idf))
Upvotes: 3