Misuse of boost::phoenix::static_cast_ to get object behind placeholder

Question

Here is my issue. I experiment using boost::spirit::qi and am trying using placeholders like "_1" and "_a". I would like to access the underlying object "behind" a boost::qi/phoenix placeholder but I'm a bit struggling here.

Let's say I have the following class:

class Tag {
public:
  Tag() = default; // Needed by qi
  Tag(std::uint8_t _raw_tag) : m_raw_tag( _raw_tag ) {}

  std::uint8_t get_size() { return m_raw_tag & 0b111; }
  std::uint8_t get_type() { return m_raw_tag & 0b1000; }

private:
  std::uint8_t m_raw_tag;
};

I have to parse frames starting with a tag byte that gives information about what I have to read next. To do this, I have written little helper class named Tag that unmasks these pieces of information like the type of the tag or size of the piece of data to come next. I always store the data in an std::uint32_t but it is possible that the size of the data is 3 bytes and not something pre-defined like 1, 2 or 4 in which case I can respectively use qi::byte or qi::big_word or qi::big_qword (assuming the big endianness). Therefore, I'm thinking about reading the data byte after byte and bit-shifting them in the output std::uint32_t. That would give such a parser in pseudo cpp code:

template
struct Read_frame : qi::grammar<_Iterator, std::uint32_t(), qi::locals> {
  Read_frame() : Read_frame::base_type(data_parser)
  {
    using boost::spirit::qi::byte_;
    using boost::spirit::qi::omit;
    using boost::spirit::qi::repeat;
    using boost::spirit::qi::_val;
    using namespace qi::labels;
    tag_parser %= byte_;
    // we read what's in the tag but we don't store it
    // Call the method get_size() of Tag is my issue, I don't know how to do it
    data_parser %= omit[tag_parser[ _a = _1.get_size()]] >> eps[_val = 0] 
      >> repeat(_a)[ byte_[ _val += (_1 << (--_a * 8)) ];
  }

  qi::rule<_Iterator, std::uint32_t(), qi::locals> data_parser;
  qi::rule<_Iterator, Tag()> tag_parser;
};

The line:

data_parser %= omit[context_tag[ _a = _1.get_size()]] >> eps[_val = 0]

is where my problem lies. I don't know how to access method of Tag in a semantic actions. Thereby I thought about using boost::phoenix::static_cast_(&_1)->get_size() or something alike but it does not work.
This is the first time I'm using the whole boost::spirit thing along with boost::phoenix and to be quite honest I don't think I really understood how the placeholders in boost work nor the principle of boost::phoenix::static_cast_. That's why I'm here gently asking for your help :). If you need more details, I will give them to you with pleasure

Thanks in advance,

A newbie with boost spirit

sehe · Accepted Answer

Semantic actions are lazy phoenix actors. That is, they are "deferred functions". You can also see them as dynamically defined composed functions.

The "value behind a placeholder" depends on the context. That context is runtime. The Phoenix transformation ("evaluation") uses that context to retrieve the actual object behind the placeholder during invocation.

The last part is the point: any runtime effect must be deferred to during invocation. That means that you need a Phoenix actor to access the get_size() method and lazily invoke it.

Clumsy? You bet. The whole semantic-action eDSL is limited. Luckily, there are many ways to approach this:

you can use phoenix::bind with a pointer-to-member function
you can use many predefined lazy functions for things like construction or most of STL (#include ).

Incidentally. phoenix::size doesn't work for your type because it doesn't adhere to STL conventions (size_t T::size() const instead of get_size).
You can write your own actors as polymorphic function objects, and adapt them either
- with BOOST_FUNCTION_ADAPT_CALLABLE
- with phoenix::function<>
In fact my favorite take on this has become px::function f = [](auto& a, auto& b) { return a + b; };, fully leveraging C++17 CTAD

Let's demonstrate all or most of these.

Step #1 Pinning Down Behaviour

As mentioned in my comment, I'm a bit confused by the apparent behavior of the parser as given, so let's first pin it down using the phoenix::bind approach as an example:

template  struct Read_frame : qi::grammar> {
    Read_frame() : Read_frame::base_type(data_parser) {
        using namespace qi::labels;

        tag_parser = qi::byte_;

        auto _size = px::bind(&Tag::get_size, _1);
        constexpr qi::_a_type _len;

        data_parser                                  //
            = tag_parser[(_len = _size, _val = 0)]   //
            >> qi::repeat(_len)[                     //
                   qi::byte_[_val += (_1 << --_len)] //
        ];
    }

    qi::rule> data_parser;
    qi::rule tag_parser;
};

Note several other simplifications/readability tricks. Now with some test cases Live On Compiler Explorer:

PASS [] -> none
PASS [0b00] -> optional(0)
PASS [0b01] -> none
PASS [0b01, 0b101010] -> optional(42)
PASS [0b10, 0b101010] -> none
PASS [0b10, 0b101010, 0b00] -> optional(84)
PASS [0b11, 0b101010, 0b00, 0b00] -> optional(168)
PASS [0b11111111] -> none
PASS [0b11111111, 0b01, 0b10, 0b11, 0b100, 0b101, 0b110, 0b111] -> optional(247)

Step #2: Simplify

Instead of the mutating of the qi::local, I'd simply incrementally shift:

    data_parser                                    //
        = tag_parser[(_len = _size, _val = 0)]     //
        >> qi::repeat(_len)[                       //
               qi::byte_[(_val <<= 1, _val += _1)] //
    ];

We have the unit tests now to verify the behavior is the same: Live On Compiler Explorer.

Step #3 Other Bind Approaches

As promised:

using phoenix::function and C++17 lambda goodness: Live

 px::function get_size = [](Tag const& tag) { return tag.get_size(); };

 data_parser                                       //
     = tag_parser[(_len = get_size(_1), _val = 0)] //
     >> qi::repeat(_len)[                          //
            qi::byte_[(_val <<= 1, _val += _1)]    //
 ];

Note that the nature of deferred function objects is polymorphic, so this works just the same:

 px::function get_size = [](auto& tag) { return tag.get_size(); };

using the same without C++17 goodness: Live

template  struct Read_frame : qi::grammar> {
    Read_frame() : Read_frame::base_type(data_parser) {
        using namespace qi::labels;
        constexpr qi::_a_type _len;

        tag_parser = qi::byte_;

        data_parser                                       //
            = tag_parser[(_len = get_size(_1), _val = 0)] //
            >> qi::repeat(_len)[                          //
                   qi::byte_[(_val <<= 1, _val += _1)]    //
        ];
    }

  private:
    struct get_size_f {
        auto operator()(Tag const& tag) const { return tag.get_size(); };
    };
    px::function get_size{};

    qi::rule> data_parser;
    qi::rule tag_parser;
};

using adaptation macros (BOOST_PHOENIX_ADAPT_CALLABLE), Live

 namespace {
     struct get_size_f {
         auto operator()(Tag const& tag) const { return tag.get_size(); };
     };

     BOOST_PHOENIX_ADAPT_CALLABLE(get_size_, get_size_f, 1);
 } // namespace

 template  struct Read_frame : qi::grammar> {
     Read_frame() : Read_frame::base_type(data_parser) {
         using namespace qi::labels;
         constexpr qi::_a_type _len;

         tag_parser = qi::byte_;

         data_parser                                        //
             = tag_parser[(_len = get_size_(_1), _val = 0)] //
             >> qi::repeat(_len)[                           //
                    qi::byte_[(_val <<= 1, _val += _1)]     //
         ];
     }

   private:
     qi::rule> data_parser;
     qi::rule tag_parser;
 };

BONUS Simplify #2

Still using Qi, I would note that there is nothing in the Tag that necessitates using that as an attribute type. In fact, we need only the trivial bit mask which might be a free function, if you really want. So, this minimal code does the same without much of the unneeded complexity:

Live On Compiler Explorer

#include 
#include 
namespace qi = boost::spirit::qi;

template  struct Read_frame : qi::grammar> {
    Read_frame() : Read_frame::base_type(start) {
        using namespace qi::labels;
        start                                          //
            = qi::byte_[(_val = 0, _a = _1 & 0b111)]   //
            >> qi::repeat(_a)[                         //
                   qi::byte_[(_val <<= 1, _val += _1)] //
        ];
    }

  private:
    qi::rule> start;
};

A free function would be just as easy: Live

start                                                         //
    = qi::byte_[(_val = 0, _a = px::bind(size_from_tag, _1))] //
    >> qi::repeat(_a)[                                        //
           qi::byte_[(_val <<= 1, _val += _1)]                //
];

BONUS Simplify And Modernize

In real life, I'd certainly code a custom parser. You can do so in Spirit Qi, but to go with the times, vastly reduce compile times and just generally make my life easier, I'd go with Spirit X3:

Live On Compiler Explorer

#include 

namespace Readers {
    namespace x3 = boost::spirit::x3;

    static constexpr uint8_t size_from_tag(uint8_t tag) { return tag & 0b111; }

    struct frame_parser : x3::parser {
        using attribute_type = uint32_t;
        bool parse(auto& first, auto last, auto&& /*ctx*/, auto&& /*rcontext*/, auto& attr) const {
            if (first == last)
                return false;
            auto    save = first;
            uint8_t tag  = *first++;
            uint8_t len  = size_from_tag(tag);

            uint32_t val = 0;
            while (len && first != last) {
                --len;
                val <<= 1;
                val += static_cast(*first++);
            }

            if (len == 0) {
                attr = val;
                return true;
            }
            first = save;
            return false;
        }
    } static frame;
} // namespace Readers

#include 
#include 
int main() {
    using Data = std::vector;

    struct {
        Data                    input;
        std::optional expected;
    } static const cases[]{
        {{}, {}}, // empty input, expect nothing in return
        {{0b0000}, 0},
        {{0b0001}, {}},                     // missing byte
        {{0b0001, 42}, 42},                 // 42
        {{0b0010, 42}, {}},                 // missing byte
        {{0b0010, 42, 0}, 2 * 42},          // 2*42
        {{0b0011, 42, 0, 0}, 4 * 42},       // 4*42
        {{0xff}, {}},                       // requires 7 bytes
        {{0xff, 1, 2, 3, 4, 5, 6, 7}, 247}, // like this
    };

    for (auto& [data, expected] : cases) {
        std::optional actual;

        auto ok      = parse(begin(data), end(data), -Readers::frame, actual);
        auto pass    = (actual == expected);
        auto verdict = pass ? "PASS" : "FAIL";
        assert(ok); // optional parser should never fail, but we want to be sure
        if (pass)
            fmt::print("{} {::#04b} -> {}
", verdict, data, actual);
        else
            fmt::print("{} {::#04b} -> {}
	 *** expected: {}
", verdict, data, actual, expected);
    }
}

Note only does this compile 10x¹ faster, I suspect it will be way easier for the compiler to optimize. Indeed this program

constexpr uint32_t parse_frame(auto const& input) {
    uint8_t v;
    parse(begin(input), end(input), x3::expect[Readers::frame], v);
    return v;
}

int main() {
    return parse_frame(std::array{0b0010, 42, 0}); // 2*42
}

Optimizes all the way to

main:
        mov     eax, 84
        ret

See it Live On Compiler Explorer including the generated assembly code

¹ proven by finger dipping