Helen lomonosova
Helen lomonosova

Reputation: 33

How can i split a binary in erlang by space?

I need split the binary like this:

Bin = <<"Hello my friend">>.
split_by_space(Bin).

and get:

[<<"Hello">>, <<"my">>, <<"friend">>]

Upvotes: 2

Views: 885

Answers (4)

Brujo Benavides
Brujo Benavides

Reputation: 1958

No big deal, you can use binary:split/3:

1> Bin = <<"Hello my friend">>.
<<"Hello my friend">>
2> binary:split(Bin, <<" ">>, [global]).
[<<"Hello">>,<<"my">>,<<"friend">>]
3>

Upvotes: 0

Hynek -Pichi- Vychodil
Hynek -Pichi- Vychodil

Reputation: 26121

There is simpler and like 2-10x more efficient than Pouriya's solution:

split(Bin) when is_binary(Bin) ->
    skip_spaces(Bin);
split(A) ->
    error(badarg, [A]).

skip_spaces(<<>>) ->                        % empty
    [];
skip_spaces(<<$\s, Rest/bytes>>) ->       % the next space
    skip_spaces(Rest);
skip_spaces(<<Bin/bytes>>) ->               % not a space
    get_word(Bin, 1).

get_word(Bin, I) ->
    case Bin of
        <<Word:I/bytes>> ->                 % the last word
            [Word];
        <<Word:I/bytes, $\s, Rest/bytes>> -> % the next word
            [Word|skip_spaces(Rest)];
        _ ->                                % a next char of the word
            get_word(Bin, I+1)
    end.

It parses with speed around 15-40MB/s on normal CPU.

Upvotes: 0

Pouriya
Pouriya

Reputation: 1626

If you don't want to use standard library, you can use:

-module(split).

%% API:
-export([split/1]).


split(Bin) when is_binary(Bin) ->
    split(Bin, <<>>, []).


%% If there was more than one space
split(<<$ :8, Rest/binary>>, <<>>, Result) ->
    split(Rest, <<>>, Result);
%% If we got space and buffer is not empty, we add buffer to list of words and make buffer empty
split(<<$ :8, Rest/binary>>, Buffer, Result) ->
    split(Rest, <<>>, [Buffer|Result]);
%% If we got a character which is not a space, we add this character to buffer
split(<<Char:8, Rest/binary>>, Buffer, Result) ->
    split(Rest, <<Buffer/binary, Char>>, Result);
%% If main binary and buffer are empty, we reverse the result for return value
split(<<>>, <<>>, Result) ->
    lists:reverse(Result);
%% If main binary is empty and buffer has one or more character, we add buffer to list of words and reverse it for return value
split(<<>>, Buffer, Result) ->
    lists:reverse([Buffer|Result]).

Test above code:

1> split:split(<<"test">>).
[<<"test">>]
2> split:split(<<"  test  ">>).
[<<"test">>]
3> split:split(<<"  te st  ">>).
[<<"te">>,<<"st">>]
4> split:split(<<"">>).         
[]
5> split:split(<<"     ">>).
[]

Upvotes: 1

amin saffar
amin saffar

Reputation: 2033

you can simply use lexemes:

http://erlang.org/doc/man/string.html

lexemes(String :: unicode:chardata(), SeparatorList :: [grapheme_cluster()]) -> [unicode:chardata()]

Returns a list of lexemes in String, separated by the grapheme clusters in SeparatorList.

string:lexemes("foo bar", " ").
["foo","bar"]
string:lexemes(<<"foo bar">>, " ").
[<<"foo">>,<<"bar">>]

The other function is split:

string:split(<<"foo bar">>, " ", trailing).
[<"foo">>,<<"bar">>]

Upvotes: 1

Related Questions