Reputation: 55
I already have tried to find a solutions on the internet for my problem, and I have the feeling I know all the small pieces but I am unable to put them together. I'm quite knew at programing so pleace be patient :D...
I have a (in reality much larger) text string which look like this:
string <- "Test test [438] test. Test 299, test [82]."
Now I want to replace the numbers in square brackets using a lookup table and get a new string back. There are other numbers in the text but I only want to change those in brackets and need to have them back in brackets.
lookup <- read.table(text = "
Number orderedNbr
1 270 1
2 299 2
3 82 3
4 314 4
5 438 5", header = TRUE)
I have made a pattern to find the square brackets using regular expressions
pattern <- "\\[(\\d+)\\]"
Now I looked all around and tried sub/gsub, lapply, merge, str_replace, but I find myself unable to make it work... I don't know how to tell R! to look what's inside the brackets, to look for that same argument in the lookup table and give out what's standing in the next column.
I hope you can help me, and that it's not a really stupid question. Thx
Upvotes: 4
Views: 1743
Reputation: 522797
Read your table of keys and values (a 2 column table) into a data frame. If your source information be a flat text file, then you can easily use read.csv
to obtain a data frame. In the example below, I hard code a data frame with just two entries. Then, I iterate over it and make replacements in the input string.
df <- data.frame(keys=c(438, 82), values=c(5, 3))
string <- "Test test [438] test. Test [82]."
for (i in 1:nrow(df)) {
string <- gsub(paste0("(?<=\\[)", df$keys[i], "(?=\\])"), df$values[i], string, perl=TRUE)
}
string
[1] "Test test 5 test. Test 3."
Note: As @Frank wisely pointed out, my solution would fail if your number markers (e.g. [438]
) happen to have replacements which are numbers also appearing as other markers. That is, if replacing a key with a value results in yet another key, there could be problems. If this be a possibility, I would suggest using markers for which this cannot happen. For example, you could remove the brackets after each replacement.
Upvotes: 2
Reputation: 66819
You can use regmatches<-
with a pattern containing lookahead/lookbehind:
patt = "(?<=\\[)\\d+(?=\\])"
m = gregexpr(patt, string, perl=TRUE)
v = as.integer(unlist(regmatches(string, m)))
`regmatches<-`(string, m, value = list(lookup$orderedNbr[match(v, lookup$Number)]))
# [1] "Test test [5] test. Test 299, test [3]."
Or to modify the string directly, change the last line to the more readable...
regmatches(string, m) <- list(lookup$orderedNbr[match(v, lookup$Number)])
Upvotes: 1
Reputation: 887981
We can use a regex look around to match only numbers that are inside a square bracket
library(gsubfn)
gsubfn("(?<=\\[)(\\d+)(?=\\])", setNames(as.list(lookup$orderedNbr),
lookup$Number), string, perl = TRUE)
#[1] "Test test [5] test. Test [3]."
Or without regex lookaround by paste
ing the square bracket on each column of 'lookup'
gsubfn("(\\[\\d+\\])", setNames(as.list(paste0("[", lookup$orderedNbr,
"]")), paste0("[", lookup$Number, "]")), string)
Upvotes: 2