bgreen
bgreen

Reputation: 87

[readtext]: read output from stringi replace into Quanteda

[CODE]
DATA_DIR <- system.file("extdata/", package = "readtext")

x<- list.files("extdata/*", recursive = TRUE) 
library("stringi")
stri_replace_all_regex(x, "Whereas.*Whereas\n{2}", "") |>
  cat() [CODE]

In this example I believe all text between two instances of 'Whereas' are removed. How do I read this text, edited by stringi into Quanteda? I want to retain the original text, but analyse the text with the selected content removed.

Upvotes: 0

Views: 26

Answers (1)

Ken Benoit
Ken Benoit

Reputation: 14902

That code won't work for various reasons, but let's assume that you have character vector x with your texts and you want to "retain the original text, but analyse the text with the selected content removed" by stringi.

You can use this code:

# make your original text
corp <- corpus(x)

# get a tokens object for analysis
toks <- stringi::stri_replace_all_regex(x, "Whereas.*Whereas\n{2}", "") |>
  tokens()

# analyse toks here
# [code]

So your original object is corp and the subsequent analysis is on the modified object.

Upvotes: 0

Related Questions