Reputation: 87
[CODE]
DATA_DIR <- system.file("extdata/", package = "readtext")
x<- list.files("extdata/*", recursive = TRUE)
library("stringi")
stri_replace_all_regex(x, "Whereas.*Whereas\n{2}", "") |>
cat() [CODE]
In this example I believe all text between two instances of 'Whereas' are removed. How do I read this text, edited by stringi into Quanteda? I want to retain the original text, but analyse the text with the selected content removed.
Upvotes: 0
Views: 26
Reputation: 14902
That code won't work for various reasons, but let's assume that you have character vector x
with your texts and you want to "retain the original text, but analyse the text with the selected content removed" by stringi.
You can use this code:
# make your original text
corp <- corpus(x)
# get a tokens object for analysis
toks <- stringi::stri_replace_all_regex(x, "Whereas.*Whereas\n{2}", "") |>
tokens()
# analyse toks here
# [code]
So your original object is corp
and the subsequent analysis is on the modified object.
Upvotes: 0