Reputation: 4869

Get byte representation of C++ class

I have objects that I need to hash with SHA256. The object has several fields as follows:

class Foo {
    // some methods
    protected:
       std::array<32,int> x;
       char y[32];
       long z;
}

Is there a way I can directly access the bytes representing the 3 member variables in memory as I would a struct ? These hashes need to be computed as quickly as possible so I want to avoid malloc'ing a new set of bytes and copying to a heap allocated array. Or is the answer to simply embed a struct within the class?

It is critical that I get the exact binary representation of these variables so that the SHA256 comes out exactly the same given that the 3 variables are equal (so I can't have any extra padding bytes etc included going into the hash function)

Upvotes: 3

Answers (4)

Turtlefight

Reputation: 10975

As long as you're able to make your class an aggregate, i.e. std::is_aggregate_v<T> == true, you can actually sort-of reflect the members of the structure.

This allows you to easily hash the members without actually having to name them. (also you don't have to remember updating your hash function every time you add a new member)

Step 1: Getting the number of members inside the aggregate

First we need to know how many members a given aggregate type has.
We can check this by (ab-)using aggregate initialization.

Example:
Given struct Foo { int i; int j; };:

Foo a{}; // ok
Foo b{{}}; // ok
Foo c{{}, {}}; // ok
Foo d{{}, {}, {}}; // error: too many initializers for 'Foo'

We can use this to get the number of members inside the struct, by trying to add more initializers until we get an error:


template<class T>
concept aggregate = std::is_aggregate_v<T>;

struct any_type {
    template<class T>
    operator T() {}
};

template<aggregate T>
consteval std::size_t count_members(auto ...members) {
    if constexpr (requires { T{ {members}... }; } == false)
        return sizeof...(members) - 1;
    else
        return count_members<T>(members..., any_type{});
}

Notice that i used {members}... instead of members....
This is because of arrays - a structure like struct Bar{int i[2];}; could be initialized with 2 elements, e.g. Bar b{1, 2}, so our function would have returned 2 for Bar if we had used members....

Step 2: Extracting the members

Now that we know how many members our structure has, we can use structured bindings to extract them.

Unfortunately there is no way in the current standard to create a structured binding expression with a variable amount of expressions, so we have to add a few extra lines of code for each additional member we want to support.

For this example i've only added a max of 4 members, but you can add as many as you like / need:

template<aggregate T>
constexpr auto tie_struct(T const& data) {
    constexpr std::size_t fieldCount = count_members<T>();
    if constexpr(fieldCount == 0) {
        return std::tie();
    } else if constexpr (fieldCount == 1) {
        auto const& [m1] = data;
        return std::tie(m1);
    } else if constexpr (fieldCount == 2) {
        auto const& [m1, m2] = data;
        return std::tie(m1, m2);
    } else if constexpr (fieldCount == 3) {
        auto const& [m1, m2, m3] = data;
        return std::tie(m1, m2, m3);
    } else if constexpr (fieldCount == 4) {
        auto const& [m1, m2, m3, m4] = data;
        return std::tie(m1, m2, m3, m4);
    } else {
        static_assert(fieldCount!=fieldCount, "Too many fields for tie_struct! add more if statements!");
    }
}

The fieldCount!=fieldCount in the static_assert is intentional, this prevents the compiler from evaluating it prematurely (it only complains if the else case is actually hit)

Now we have a function that can give us references to each member of an arbitrary aggregate.

Example:

struct Foo {int i; float j; std::string s; };

Foo f{1, 2, "miau"};
// tup is of type std::tuple<int const&, float const&, std::string const&>
auto tup = tie_struct(f);

// this will output "12miau"
std::cout << std::get<0>(tup) << std::get<1>(tup) << std::get<2>(tup) << std::endl;

Step 3: hashing the members

Now that we can convert any aggregate into a tuple of its members, hashing it shouldn't be a big problem.

You can basically hash the individual types like you want and then combine the individual hashes:

// for merging two hash values
std::size_t hash_combine(std::size_t h1, std::size_t h2)
{
    return (h2 + 0x9e3779b9 + (h1<<6) + (h1>>2)) ^ h1;
}

// Handling primitives
template <class T, class = void>
struct is_std_hashable : std::false_type { };

template <class T>
struct is_std_hashable<T, std::void_t<decltype(std::declval<std::hash<T>>()(std::declval<T>()))>> : std::true_type { };

template <class T>
concept std_hashable = is_std_hashable<T>::value; 

template<std_hashable T>
std::size_t hash(T value) {
    return std::hash<T>{}(value);
}

// Handling tuples
template<class... Members>
std::size_t hash(std::tuple<Members...> const& tuple) {
    return std::apply([](auto const&... members) {
        std::size_t result = 0;
        ((result = hash_combine(result, hash(members))), ...);
        return result;
    }, tuple);
}

template<class T, std::size_t I>
using Arr = T[I];

// Handling arrays
template<class T, std::size_t I>
std::size_t hash(Arr<T, I> const& arr) {
    std::size_t result = 0;
    for(T const& elem : arr) {
        std::size_t h = hash(elem);
        result = hash_combine(result, h);
    }
    return result;
};

// Handling structs
template<aggregate T>
std::size_t hash(T const& agg) {
    return hash(tie_struct(agg));
}

This allows you to hash basically any aggregate struct, even with arrays and nested structs:

struct Foo{ int i; double d; std::string s; };
struct Bar { Foo k[10]; float f; };

std::cout << hash(Foo{1, 1.2f, "miau"}) << std::endl;
std::cout << hash(Bar{}) << std::endl;

full example on godbolt

Footnotes

This only works with aggregates
No need to worry about padding because we access the members directly.
You have to add a few more ifs into tie_struct if you need more than 4 members
The provided hash() function doesn't handle all types - if you need e.g. std::array, std::pair, etc... you need to add overloads for those.
It's a lot of boilerplate code, but it's insanely powerful.
You can also use Boost.PFR for the aggregate-to-tuple part, if you are allowed to use boost

Upvotes: 0

JohnFilleau

Reputation: 4288

You can solve this by making an iterator that knows the layout of your member variables. Make Foo::begin() and Foo::end() functions and you can even take advantage of range-based for loops.

If you can increment it and dereference it, you can use it any other place you're able to use a LegacyForwardIterator.

Add in comparison functions to get access to the common it = X.begin(); it != X.end(); ++it idiom.

Some downsides include: ugly library code, poor maintainability, and (in this current form) no regard for endianess.

The solution to the latter downside is left as an exercise to the reader.

#include <array>
#include <iostream>

class Foo {
    friend class FooByteIter;

public:
    FooByteIter begin() const;

    FooByteIter end() const;

    Foo(const std::array<int, 2>& x, const char (&y)[2], long z)
    : x_{x}
    , y_{y[0], y[1]}
    , z_{z}
    {}

protected:
    std::array<int, 2> x_;
    char y_[2];
    long z_;
};

class FooByteIter {
public:
    FooByteIter(const Foo& foo)
        : ptr_{reinterpret_cast<const char*>(&(foo.x_))}
        , x_end_{reinterpret_cast<const char*>(&(foo.x_)) + sizeof(foo.x_)}
        , y_begin_{reinterpret_cast<const char*>(&(foo.y_))}
        , y_end_{reinterpret_cast<const char*>(&(foo.y_)) + sizeof(foo.y_)}
        , z_begin_{reinterpret_cast<const char*>(&(foo.z_))}
    {}

    static FooByteIter end(const Foo& foo) {
        FooByteIter fbi{foo};
        fbi.ptr_ = reinterpret_cast<const char*>(&foo.z_) + sizeof(foo.z_);

        return fbi;
    }

    bool operator==(const FooByteIter& other) const { return ptr_ == other.ptr_; }
    bool operator!=(const FooByteIter& other) const { return ! (*this == other); }

    FooByteIter& operator++() {
        ptr_++;
        if (ptr_ == x_end_) {
            ptr_ = y_begin_;
        }
        else if (ptr_ == y_end_) {
            ptr_ = z_begin_;
        }

        return *this;
    }

    FooByteIter operator++(int) {
        FooByteIter pre = *this;
        (*this)++;
        return pre;
    }

    char operator*() const {
        return *ptr_;
    }

private:
    const char* ptr_;

    const char* const x_end_;
    const char* const y_begin_;
    const char* const y_end_;
    const char* const z_begin_;
};

FooByteIter Foo::begin() const {
    return FooByteIter(*this);
}

FooByteIter Foo::end() const {
    return FooByteIter::end(*this);
}

template <typename InputIt>
char checksum(InputIt first, InputIt last) {
    char check = 0;
    while (first != last) {
        check += (*first);
        ++first;
    }

    return check;
}

int main() {
    Foo f{{1, 2}, {3, 4}, 5};
    for (const auto b : f) {
        std::cout << (int)b << ' ';
    }

    std::cout << std::endl;

    std::cout << "Checksum is: " << (int)checksum(f.begin(), f.end()) << std::endl;
}

You can generalize this further by making serialization functions for all data types you might care about, allowing serialization of classes that aren't plain-old-data types.

Warning

This code assumes that the underlying types being serialized have no internal padding, themselves. This answer works for this datatype because it is made of types which themselves do not pad. To make this work for datatypes that have datatypes that have padding, this method would need to be recursed all the way down.

Upvotes: 2

Fozi

Reputation: 5145

Most Hash classes are able to take multiple regions before returning the hash, e.g. as in:

class Hash {
    public:
        void update(const void *data, size_t size) = 0;
        std::vector<uint8_t> digest() = 0;
}

So your hash method could look like this:

std::vector<uint8_t> Foo::hash(Hash *hash) const {
    hash->update(&x, sizeof(x));
    hash->update(&y, sizeof(y));
    hash->update(&z, sizeof(z));
    return hash->digest();
}

Upvotes: 3

Ilya Veselov

Reputation: 55

Just cast a pointer to object to a pointer to char. You can iterate through the bytes by increment. Use sizeof(foo) to check overflow.