Solana
Solana

Reputation: 55

r replace text within a string by lookup table

I already have tried to find a solutions on the internet for my problem, and I have the feeling I know all the small pieces but I am unable to put them together. I'm quite knew at programing so pleace be patient :D...

I have a (in reality much larger) text string which look like this:

string <- "Test test [438] test. Test 299, test [82]."

Now I want to replace the numbers in square brackets using a lookup table and get a new string back. There are other numbers in the text but I only want to change those in brackets and need to have them back in brackets.

lookup <- read.table(text = "
Number   orderedNbr
1 270 1
2 299 2
3 82  3
4 314 4
5 438 5", header = TRUE)

I have made a pattern to find the square brackets using regular expressions

pattern <- "\\[(\\d+)\\]"

Now I looked all around and tried sub/gsub, lapply, merge, str_replace, but I find myself unable to make it work... I don't know how to tell R! to look what's inside the brackets, to look for that same argument in the lookup table and give out what's standing in the next column.

I hope you can help me, and that it's not a really stupid question. Thx

Upvotes: 4

Views: 1743

Answers (3)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522797

Read your table of keys and values (a 2 column table) into a data frame. If your source information be a flat text file, then you can easily use read.csv to obtain a data frame. In the example below, I hard code a data frame with just two entries. Then, I iterate over it and make replacements in the input string.

df <- data.frame(keys=c(438, 82), values=c(5, 3))
string <- "Test test [438] test. Test [82]."
for (i in 1:nrow(df)) {
    string <- gsub(paste0("(?<=\\[)", df$keys[i], "(?=\\])"), df$values[i], string, perl=TRUE)
}

string

[1] "Test test 5 test. Test 3."

Demo

Note: As @Frank wisely pointed out, my solution would fail if your number markers (e.g. [438]) happen to have replacements which are numbers also appearing as other markers. That is, if replacing a key with a value results in yet another key, there could be problems. If this be a possibility, I would suggest using markers for which this cannot happen. For example, you could remove the brackets after each replacement.

Upvotes: 2

Frank
Frank

Reputation: 66819

You can use regmatches<- with a pattern containing lookahead/lookbehind:

patt = "(?<=\\[)\\d+(?=\\])"
m = gregexpr(patt, string, perl=TRUE)
v = as.integer(unlist(regmatches(string, m)))

`regmatches<-`(string, m, value = list(lookup$orderedNbr[match(v, lookup$Number)]))
# [1] "Test test [5] test. Test 299, test [3]."

Or to modify the string directly, change the last line to the more readable...

regmatches(string, m) <- list(lookup$orderedNbr[match(v, lookup$Number)])

Upvotes: 1

akrun
akrun

Reputation: 887981

We can use a regex look around to match only numbers that are inside a square bracket

library(gsubfn)
gsubfn("(?<=\\[)(\\d+)(?=\\])", setNames(as.list(lookup$orderedNbr), 
             lookup$Number), string, perl = TRUE)
#[1] "Test test [5] test. Test [3]."

Or without regex lookaround by pasteing the square bracket on each column of 'lookup'

gsubfn("(\\[\\d+\\])", setNames(as.list(paste0("[", lookup$orderedNbr, 
          "]")), paste0("[", lookup$Number, "]")), string)

Upvotes: 2

Related Questions