Nordlöw
Nordlöw

Reputation: 12138

Stream-Based Processing of Range Chunks in D

I'm looking for an elegant way to perform chunk-stream-based processing of arrays/ranges. I'm building a file indexing/search engine in D that calculates various kinds of statistics on files such as histograms and SHA1-digests. I want these calculations to be performed in a single pass with regards to data-access locality.

Here is an excerpt from the engine

/** Process File in Cache Friendly Chunks. */
void calculateCStatInChunks(immutable (ubyte[]) src,
                            size_t chunkSize, bool doSHA1, bool doBHist8) {
    if (!_cstat.contentsDigest[].allZeros) { doSHA1 = false; }
    if (!_cstat.bhist8.allZeros) { doBHist8 = false; }

    import std.digest.sha;
    SHA1 sha1;
    if (doSHA1) { sha1.start(); }

    import std.range: chunks;
    foreach (chunk; src.chunks(chunkSize)) {
        if (doSHA1) { sha1.put(chunk); }
        if (doBHist8) { /*...*/ }
    }

    if (doSHA1) {
        _cstat.contentsDigest = sha1.finish();
    }
}

Seemingly this is not a very elegant (functional) approach as I have to spread logic for each statistics (reducer) across three different places in the code, namely start, put and finish.

Does anybody have suggestions/references on Haskell-monad-like stream based APIs that can make this code more D-style component-based?

Upvotes: 2

Views: 115

Answers (1)

Sergei Nosov
Sergei Nosov

Reputation: 1675

I have no experience with Haskell, but, please, let me share how I would approach this in D. Maybe it will be of some use.

First-off, I find that

if (!_cstat.contentsDigest[].allZeros) { doSHA1 = false; }
if (!_cstat.bhist8.allZeros) { doBHist8 = false; }

should be put outside of the function. It depends on the functions being evaluated, and if we want a general solution - we can't leave it.

Similar could be said about the

_cstat.contentsDigest = sha1.finish();

It's kind of a separate matter what we should do with the results.

So, throwing this stuff outside the function and adding some templates gives the following code

import std.digest.sha;
import std.stdio;
import std.algorithm;

void copyToMany(R, T...)(R src, T target)
{
    foreach(element; src)
    {
        foreach(s; target)
            s.put(element);
    }
}

void main()
{
    import std.range: chunks;

    auto input = (cast(immutable (ubyte[]))[1, 2, 3]).chunks(2);

    SHA1 sha1 = makeDigest!SHA1();
    auto reducer = new Reducer!(0, (a,b) => a + b);

    input.copyToMany(&sha1, reducer);

    writeln(sha1.finish().toHexString());
    writeln(reducer.result);
}



class Reducer(alias init, alias Func)
{
    typeof(init) result = init;

    void put(R)(R r) {
        foreach (e; r)
        {
            result = Func(result, e);
        }
    }
}

Upvotes: 3

Related Questions