Recursive list analysis in Erlang

Question

I'm playing with Erlang and trying to write an S-expression parser. I find it to be an easy task in Python using stacks and loops, but it's non-trivial for me as a beginner in immutable variables and Erlang data structures.

I need to transform a list in Erlang like this:

X = ["0", "(", "1", "2", "3", ")"],
Res = transform(X). % ["0", ["1", "2", "3"]]

By now, I've come to this:

transform(List) ->
    lists:map(fun(X)->
                      case string:equal("(", X) of
                          %% recursive call with sublist of List from "(" to ")" as argument
                          true -> transform_to_list(Lack) 
                      end
              end, List).

Not sure how to get the sublist Lack and pass it as an argument. Am I going in the right direction?

Steve Vinoski · Accepted Answer

You can solve this using an accumulator and pattern matching:

-module(t).
-export([transform/1]).

transform(List) ->
    transform(List, []).

transform([], Acc) ->
    lists:reverse(Acc);
transform(["("|T], Acc) ->
    transform(T, {[],Acc});
transform([")"|T], {L,{L2,Acc}}) ->
    transform(T, {[lists:reverse(L)|L2],Acc});
transform([")"|T], {L,Acc}) ->
    transform(T, [lists:reverse(L)|Acc]);
transform([H|T], {L,Acc}) ->
    transform(T, {[H|L],Acc});
transform([H|T], Acc) ->
    transform(T, [H|Acc]).

The transform/1 function just sets up an empty accumulator for transform/2, where all the work is done.

The transform/2 function is separated into multiple pattern-matching recursive clauses:

The first clause handles the case where we've exhausted the input list, and it simply returns the reversed accumulator. Reversal is needed because items are pushed into the accumulator, so it ends up in reverse order. This is a common pattern in Erlang and other functional languages.
The second clause recognizes a "(", which starts a new sublist. To handle it, it changes the accumulator to a 2-tuple, where the first item is a sublist accumulator and the second item is the old accumulator.
The third and fourth clauses handle ")", which ends a sublist. The third clause is for the case where the accumulator is a tuple holding a second element which is also a tuple; it adds the new sublist as an item to the previous sublist and pops one level from the accumulator tuple. The fourth clause handles the case where the original accumulator in the tuple is a list, adding the new sublist to the head of the original accumulator to form a new accumulator list.
The fifth and sixth clauses handle input items that are not grouping operators. The fifth clause handles the case when the accumulator is a tuple, while the sixth clause handles the case when the accumulator is a list.

Running this on your original example shows the correct answer:

1> c(t).
{ok,t}
2> t:transform(["0", "(", "1", "2", "3", ")"]).
["0",["1","2","3"]]

But it can also handle nested groups:

3> t:transform(["0", "(", "11", "22", "(", "333", "444",
                "(", "5555", ")", "666", ")", "77", "88", ")", "9"]).
["0",["11","22",["333","444",["5555"],"666"],"77","88"],"9"]

Recursive list analysis in Erlang

Answers (2)

Related Questions