Reputation: 137
I am trying to split a binary up into 80 character chucks.
Li= <<"Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Maecenas vitae ligula urna. Etiam id pulvinar arcu. Ut
maximus eros sed ligula blandit aliquet. Vivamus arcu urna,
efficitur cursus dapibus nec, cursus sit amet elit. Aliquam
tortor magna, aliquet vulputate nulla sit amet, efficitur cras amet.">>.
I have tried re:split(Li,"(.{80})") which gives me.
[<<>>,
<<"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas vitae ligula u">>,
<<>>,
<<"rna. Etiam id pulvinar arcu. Ut maximus eros sed ligula blandit aliquet. Vivamus">>,
<<>>,
<<" arcu urna, efficitur cursus dapibus nec, cursus sit amet elit. Aliquam tortor m">>,
<<"agna, aliquet vulputate nulla sit amet, efficitur cras amet.">>]
How do I get rid of the empty parts of the list and why am I getting them?
Upvotes: 1
Views: 547
Reputation: 20916
You could do
re:run(B, <<".{80}">>,[{capture,first,binary},global]).
but it does return a list of lists of binaries.
Upvotes: 0
Reputation: 2392
You're getting empty parts because those are the matched portions between your tokens. re:split
(like string:tokens
) looks for data around the matched portions, not the matched portions themselves. The only reason you are receiving the eighty-character chunks is because you have a group in your regular expression.
To the best of my knowledge, there is no way to remove the empty parts of your result (without explicit filtering), because those are the parts that re:split
expects to return.
One way you could achieve the desired result would be to use a standard regular expression (as opposed to splitting):
re:run("abcdefg", ".{2}", [global, {capture, all, binary}]) = {match,[[<<"ab">>],[<<"cd">>],[<<"ef">>]]}.
As you can see, we're simply matching all two-character groups we can find in the string.
That being said, regular expressions are not the ideal solution for this; they're overkill, to say the least. It should be relatively simple to write a function which extracts eighty-character chunks (or however many remain) from the binary. For instance:
make_chunks(<<C:80/binary>>, Rest/binary>>) ->
[C|make_chunks(Rest)];
make_chunks(<<>>) ->
[];
make_chunks(<<Rest/binary>>) ->
[Rest].
That would also work and doesn't require complex evaluations or compiling of a regular expression. It may also make sense to use the "utf8" type (<<C:80/utf8>>
) if you intend to handle Unicode.
Upvotes: 2
Reputation: 91385
I don't know erlang, but in many languages, when you split on regex with capture group, as you do, the group is put in the result.
So, you want to split on 80 charachers and keep the delimiter.
The result is:
''
: this is what there is before the first delimiter (ie: before the first 80 characters)Lorem ipsum ... ligula u
: this the first delimiter (ie: the 80 character)''
: this is what there is between the first and second delimiter.Upvotes: 1