Faye
Faye

Reputation: 63

R Tidytext and unnest_tokens error

Very new to R and have started to use the tidytext package.

I'm trying to use arguments to feed into the unnest_tokens function so I can do multiple column analysis. So instead of this

library(janeaustenr)
library(tidytext)
library(dplyr)
library(stringr)

original_books <- austen_books() %>%
  group_by(book) %>%
  mutate(linenumber = row_number(),
         chapter = cumsum(str_detect(text, regex("^chapter [\\divxlc]",
                                                 ignore_case = TRUE)))) %>%
  ungroup()

original_books

tidy_books <- original_books %>%
              unnest_tokens(word, text)

The last line of code would be:

output<- 'word'
input<- 'text'

tidy_books <- original_books %>%
              unnest_tokens(output, input)

But I'm getting this:

Error in check_input(x) : Input must be a character vector of any length or a list of character vectors, each of which has a length of 1.

I've tried using as.character() without much luck.

Any ideas on how this would work?

Upvotes: 4

Views: 11178

Answers (2)

vegatroz
vegatroz

Reputation: 44

I got same issue. I solved this by specifying input as below:

unnest_tokens(input = "events", token = "words", "word")

with "events" is actually my column name.

Upvotes: 0

Weihuang Wong
Weihuang Wong

Reputation: 13118

Try

tidy_books <- original_books %>% 
              unnest_tokens_(output, input)

with the underscore in unnest_tokens_.

unnest_tokens_ is the "standard evaluation" version of unnest_tokens, and allows you to pass in variable names as strings. See Non-standard evaluation for a discussion of standard vs non-standard evaluation.

Upvotes: 5

Related Questions