Reputation: 24888
Let's say I have a list of words, where a keyword, in this case "stop", demarcates full sentences:
["Hello", "from", "Paris", "stop", "Weather", "is", "sunny", "stop", "Missing", "you", "stop"]
which I want to turn into:
[["Hello", "from", "Paris"], ["Weather", "is", "sunny"], ["Missing", "you"]]
I know I can do this with strings with String.split, but ideally I'd like to learn how to tackle the above problem with fundamental functional constructs, such as recursion on [head|tail] etc, but I cannot figure out where to start on how to accumulate intermediate lists.
Upvotes: 8
Views: 1687
Reputation: 7779
You can use chunk_by/2
:
["Hello", "from", "Paris", "stop", "Weather", "is", "sunny", "stop", "Missing", "you", "stop"]
|> Enum.chunk_by(fn(x) -> x != "stop" end)
|> Enum.reject(fn(x) -> x == ["stop"] end)
Out of curiosity, I wanted to benchmark the performance of the implementations given to this question. The benchmark was for 100,000 calls of each implementation and I ran it 3 times. Here are the results if someone is interested:
0.292903s | 0.316024s | 0.292106s | chunk_by
0.168113s | 0.152456s | 0.151854s | Main.main (@Dogbert's answer)
0.167387s | 0.148059s | 0.143763s | chunk_on (@Martin Svalin's answer)
0.177080s | 0.180632s | 0.185636s | splitter (@stephen_m's answer)
Upvotes: 7
Reputation: 910
Personally, I like AbM's
answer best and I prefer it over and above this answer due to the easy readability.
That said, I decided out of interest to see if it could be done without the final Enum.reject
function.
def splitter(list) do
res =
List.foldl(list, [], fn(word, acc)->
case {word, acc} do
{"stop", []} ->
[]
{word, []} ->
[[word]]
{"stop", [[], acc]} ->
[h | t] = acc
[Enum.reverse(h) | t]
{"stop", acc} ->
[h | t] = acc
[[] | [Enum.reverse(h) | t]]
{word, [[] | acc]} ->
[[word] | acc]
{word, acc} ->
[h | t] = acc
new_h = [word | h]
if t == [], do: [new_h], else: [new_h | t]
end
end)
res = if List.first(res) == [], do: ([h | t] = res; t), else: (res)
Enum.reverse(res)
end
splitter(["Hello", "from", "Paris", "stop", "Weather", "is", "sunny", "stop", "Missing", "you", "stop"])
# [["Hello", "from", "Paris"], ["Weather", "is", "sunny"], ["Missing", "you"]]
Looking at the code is a bit of a headache and I probably wouldn't use it for that reason but I think it runs faster.
Upvotes: 2
Reputation: 2267
This is almost what Enum.chunk_by/2
does.
def chunk_by(enumerable, fun)
Splits enumerable on every element for which fun returns a new value.
But chunk_by
won't throw away any elements, so we can combine it with Enum.filter/2
.
list = [1, 2, 3, :stop, 4, 5, 6, :stop, 7, 8, :stop] # analogous to your list
list
|> Enum.chunk_by(&(&1 == :stop))
# at this point, you have [[1,2,3], [:stop], [4,5,6], [:stop], [7,8], [:stop]]
|> Enum.reject(&(&1 == [:stop]))
# here you are: [[1,2,3], [4,5,6], [7,8]]
A second approach would be to use Enum.reduce/3
. Since we build up the accumulator at the front, pushing the first elements we find towards the back, it makes sense to reverse the list before we reduce it. Otherwise we'll end up with a reversed list of reversed lists.
We'll potentially get empty lists, like the final :stop
in our example list. So again, we filter the list at the end.
list
|> Enum.reverse
|> Enum.reduce([[]], fn # note: the accumulator is a nested empty list
:stop, acc -> [[] | acc] # element is the stop word, start a new list
el, [h | t] -> [[el | h] | t] # remember, h is a list, t is list of lists
end)
|> Enum.reject(&Enum.empty?/1)
Finally, let's walk the list ourselves, and build an accumulator. If this reminds you of the reduce
version, it's no coincidence.
defmodule Stopword do
def chunk_on(list, stop \\ :stop) do
list
|> Enum.reverse
|> chunk_on(stop, [[]])
end
defp chunk_on([], _, acc) do
Enum.reject(acc, &Enum.empty?/1)
end
defp chunk_on([stop | t], stop, acc) do
chunk_on(t, stop, [[] | acc])
end
defp chunk_on([el | t], stop, [head_list | tail_lists]) do
chunk_on(t, stop, [[el | head_list] | tail_lists])
end
end
We use the common pattern of a public function that doesn't require users to worry about the accumulator, and passing on the inputs to a private arity+1 function with an accumulator. Since we're building up a list of lists, it's useful to start off the accumulator with an empty list inside it. This way, we don't have to special case when the accumulator is empty.
We reverse the list before we walk it, as we did for reduce
, just as we reject empty lists after we're done. The same reasons apply.
We use pattern matching to identify the stop word. The stop word marks the beginning of a new list, so we add a new, empty list and throw away the stop word.
A regular word is simply put at the front of the first list, in our list of lists. The syntax is a bit unwieldy with all those bars and brackets.
Upvotes: 3
Reputation: 222060
Here's a simple tail-recursive implementation using pattern matching:
defmodule Main do
def split_on(list, on) do
list
|> Enum.reverse
|> do_split_on(on, [[]])
|> Enum.reject(fn list -> list == [] end)
end
def do_split_on([], _, acc), do: acc
def do_split_on([h | t], h, acc), do: do_split_on(t, h, [[] | acc])
def do_split_on([h | t], on, [h2 | t2]), do: do_split_on(t, on, [[h | h2] | t2])
def main do
["Hello", "from", "Paris", "stop", "Weather", "is", "sunny", "stop", "Missing", "you", "stop"]
|> split_on("stop")
|> IO.inspect
end
end
Main.main
Output:
[["Hello", "from", "Paris"], ["Weather", "is", "sunny"], ["Missing", "you"]]
Upvotes: 3