Sean Mackesey
Sean Mackesey

Reputation: 10939

Joining regular expressions in julia

x = r"abc"
y = r"def"
z = join([x,y], "|")

z # => r"r\"abc\"|r\"def\""

Is there a way to join (and in general manipulate) Regex that deals only with the regex content (i.e. does not treat the r modifier as if it's part of the content). The desired output for z is:

z # => r"abc|def"

Upvotes: 10

Views: 1711

Answers (2)

Leonard Neon
Leonard Neon

Reputation: 206

Instead of joining regexes, I think that it is better to join strings and then convert the result to regex. In this way, you can solve your problem as follows:

x = "abc"
y = "def"
z = Regex(join([x,y], "|"))
println(z)

You should get r"abc|def" as the output.


Note: Here I exploited the answer of Michel Fox by removing the macro

Upvotes: 1

Michael Fox
Michael Fox

Reputation: 3642

macro p_str(s) s end
x = p"abc"
y = p"def"
z = Regex(join([x,y], "|"))

The r"quote" operator actually compiles a regular expression for you which takes time. If you have just parts of a regular expression that you want to use to build a bigger one then you should store the parts using "regular quotes".

But what about the sketchy escaping rules that you get with r"quote" versus "regular quotes" you ask? If you want the sketchy r"quote" rules but not to compile a regular expression immediately then you can use a macro like:

macro p_str(s) s end

Now you have a p"quote" that escapes like an r"quote" but just returns a string.

Not to go off topic but you might define a bunch of quotes for getting around tricky alphabets. Here's some convenient ones:

                                       # "baked\nescape"    -> baked\nescape
macro p_mstr(s) s end                  # p"""raw\nescape""" -> raw\\nescape
macro dq_str(s) "\"" * s * "\"" end    # dq"with quotes"    -> "with quotes"
macro sq_str(s) "'" * s * "'" end      # sq"with quotes"    -> 'with quotes'
macro s_mstr(s) strip(lstrip(s))  end  # s"""  "stripme" """-> "stripme"

When you're done making fragments you can do your join and make a regex like:

myre = Regex(join([x, y], "|"))

Just like you thought.

If you want to learn more about what members an object has (such as Regex.pattern) try:

julia> dump(r"pat")
Regex 
  pattern: ASCIIString "pat"
  options: Uint32 33564672
  regex: Array(Uint8,(61,)) [0x45,0x52,0x43,0x50,0x3d,0x00,0x00,0x00,0x00,0x28  …   0x1d,0x70,0x1d,0x61,0x1d,0x74,0x72,0x00,0x09,0x00]

Upvotes: 8

Related Questions