Reputation: 2157
I'm trying to build a parser using String.split/3
as an exercise just to see if I can make it faster than classic recursive pattern matching (by effectively allocating fewer strings), but I'm having trouble making it correct.
I need to be able to split binaries in 2 parts at either a {
or a }
, and String.split(str, ["{", "}"], parts: 2)
does exactly that of course. My problem is that the result contains only the list [part1, part2]
but it does not contain the character it matched on, which I need because it affects the behavior of the parser.
My first instinct was to make my own index/2
but after reading up some more it seems like a bad idea
Is there a better alternative for my use case? I'd like to go through the string as few times as possible, creating new strings only at {
and }
boundaries.
Thank you for reading!
Upvotes: 2
Views: 1122
Reputation: 222060
Since you only want to split at the first occurrence, I'd suggest using :binary.match/3
and :binary.part/3
here. :binary.match/3
returns a tuple of index of the match and the length of the match on success, which can then be used with :binary.part/2
to split the binary.
defmodule A do
def split(binary) do
case :binary.match(binary, ["{", "}"]) do
{start, length} ->
before = :binary.part(binary, 0, start)
match = :binary.part(binary, start, length)
after_ = :binary.part(binary, start + length, byte_size(binary) - (start + length))
{before, match, after_}
:nomatch -> nil
end
end
end
IO.inspect A.split("foo { bar") |> IO.inspect
IO.inspect A.split("foo } bar") |> IO.inspect
IO.inspect A.split("foo + bar") |> IO.inspect
Output:
{"foo ", "{", " bar"}
{"foo ", "{", " bar"}
{"foo ", "}", " bar"}
{"foo ", "}", " bar"}
nil
nil
This implementation goes through the string only once, in the :binary.match/3
call. All other functions and operations used are O(1)
.
I've used a similar method in an XML parser I was working on and it provided a huge speed up compared to recursive pattern matching on the binary.
Edit: you can use pattern matching to make this code way shorter, and I'm pretty sure with the exact same performance as above:
defmodule A do
def split(binary) do
case :binary.match(binary, ["{", "}"]) do
{start, length} ->
<<a::binary-size(start), b::binary-size(length), c::binary>> = binary
{a, b, c}
:nomatch -> nil
end
end
end
Upvotes: 3
Reputation: 50215
Dig a little deeper in the String.split docs and use the include_captures
option.
iex> String.split("Foo{bar}", ~r({|}), [include_captures: true, trim: true])
["Foo", "{", "bar", "}"]
Upvotes: 3