Find unused character(s) in string

Question

For a library call I have to provide a separator, which must not occur in the in the text, because otherwise the library call gets confused.

Now I was wondering how I can adapt my code to assure that the separator I use is guaranteed not to occur in the input text.

I am solving this issue with a while loop: I make a (hardcoded) assumption about the most unlikely string in the input, check if it is present and if so, just enlarges the string. This works but feels very hackish, so I was wondering whether there is a more elegant version (e.g. an existing base R function, or a loop free solution), which does the same for me? Ideally the found separator is also minimal in length.

I could simply hardcode a large enough set of potential separators and look for the first one not occuring in the text, but this may also break at some point if all of these sepeatirs happen to occur in my input.

Reasoning for that is that even if it will never happen (well never say never), I am afraid that in some distant future there will be this one input string which requires thousands of while loops before finding an unused string.

input_string <- c("a/b", "a#b", "a//b", "a-b", "a,b", "a.b")
orig_sep <- sep <- "/" ## first guess as a separator
while(any(grepl(sep, input_string, fixed = TRUE))) {
  sep <- paste0(sep, orig_sep)
}
print(sep)
# "///"

GKi · Accepted Answer

In case 1 ASCII can be found you can use table.

tt <- table(factor(strsplit(paste(input_string, collapse = ""), "")[[1]]
       , rawToChar(as.raw(32:126), TRUE)))
names(tt)[tt==0]

rawToChar(as.raw(32:126), TRUE) gives you all ASCII's, which are used as factor levels. And table counts all cases. If there is at least one 0 you can use it.

In case you need 2 ASCII you can try the following returning all possible delimiters:

x <- rawToChar(as.raw(32:126), TRUE)
x <- c(outer(x, x, paste0))
x[!sapply(x, function(y) {any(grepl(y, input_string, fixed=TRUE))})]

Or for n-ASCII:

orig_sep  <- x <- rawToChar(as.raw(32:126), TRUE)
sep  <- x[0]
repeat {
  sep <- x[!sapply(x, function(y) {any(grepl(y, input_string, fixed=TRUE))})]
  if(length(sep) > 0) break;
  x <- c(outer(x, orig_sep, paste0))
}
sep

Search for 1-2 ASCII with only a sapply-loop and taking separator with minimal length.

x <- rawToChar(as.raw(32:126), TRUE)
x <- c(x, outer(x, x, paste0))
x[!sapply(x, function(y) {any(grepl(y, input_string, fixed=TRUE))})][1]
#[1] " "

In case you want to know how many times a character needs to be repeated to work as a separator, as you do it in the question, you can use gregexpr.

strrep("/", max(sapply(gregexpr("/*", input_string)
  , function(x) max(attributes(x)$match.length)))+1)
#[1] "///"

strrep("/", max(c(0, sapply(gregexpr("/+", input_string)
  , function(x) max(attributes(x)$match.length))))+1)
#[1] "///"

Find unused character(s) in string

Answers (2)

Related Questions