spindoctor
spindoctor

Reputation: 1895

Executing multiple pattern and replacements stored in a dataframe

I have a data frame of search and replace patterns. The variable pattern contains the regexes I would like to find, and the variable replacement contains the replacements.

This is basically a pre-processing script for some text analysis. I would like to execute these search and replacements on each document. But right now I can't find out how to execute each search and replacement on a single document. the code below shows my problem.

What am I missing?

library(stringi)
library(purrr)
pattern<-c("good-bye[a-z]+|good-bye", "-like")
replacement<-c("goodbye", "xlike")

df<-data.frame(pattern, replacement, stringsAsFactors = F)
df
str(df)
doc<-c("hello let's say some good-byes you look-like someone.")

pmap(df, stri_replace_all_regex, str=doc, vectorize_all=T)

Upvotes: 1

Views: 43

Answers (1)

akrun
akrun

Reputation: 887118

We could use reduce2 to update the 'doc' string in each iteration of 'pattern/replacement'

library(purrr)
library(stringi)
reduce2(df$pattern, df$replacement,   stri_replace_all_regex, .init = doc)
#[1] "my documents are really long and have some numbers in them like some number and some number also some letters like letter x and letter x that need to be replaced"

Upvotes: 1

Related Questions