geometrian
geometrian

Reputation: 15387

Creating a `va_list` Using a Pointer of Packed Arguments on Clang and g++

I am working on a cycle-accurate simulator for a research architecture. I already have a cross-compiler that generates assembly (based on MIPS). For debug purposes, we have a printf intrinsic which ultimately, when run in the simulator, calls a builtin method that has access to a list of arguments packed in a contiguous array (such as would be created by this code):

template <typename type> inline static void insert(char buffer[], size_t* i, type value) {
    memcpy(buffer+*i,&value, sizeof(type)); *i+=sizeof(type);
}
int main(int /*argc*/, char* /*argv*/[]) {
    char buffer[512]; size_t i=0;
    insert<double>(buffer,&i, 3.14);
    insert<int>(buffer,&i, 12345);
    insert<char const*>(buffer,&i, "Hello world!");

    return 0;
}

In MSVC, one can then create a va_list and call vprintf like so:

union { va_list list; char* arguments; } un;
un.arguments = buffer;
vprintf(format_string, un.list);

The target architecture is x86-64, which is based on x86, so this produces apparently correct results (the va_list provided by MSVC is just a typedef for char*).

However, on g++ (and presumably Clang; I haven't tried), the code segfaults. This happens because the underlying type (it's compiler-provided: in gcc 4.9.2, it appears to be typedefed from __gnuc_va_list, which is in turn typedefed from __builtin_va_list, presumably a compiler intrinsic) is different (as the compiler error you get it you just go un.list=buffer; forbodes).


My question is: what is the cleanest way to convert this array of packed arguments into a va_list that is usable by both g++ and Clang in x86-64 mode?

My current thinking is that it may be better to parse out each format specifier individually, then forward it off with the appropriate argument to printf. This isn't as robust (in the sense of supporting all features of printf; working on a single architecture only is robust enough for our purposes), nor is it particularly compelling, though.

Upvotes: 3

Views: 732

Answers (2)

Yakk - Adam Nevraumont
Yakk - Adam Nevraumont

Reputation: 275385

struct buffer {
  const char* ptr = 0;
  size_t count = 0;
};
template<class T>
T const* get_arg( buffer& b ) {
  T const* r = reinterpret_cast<T const*>(b.ptr);
  b.ptr += sizeof(T);
  b.count -= sizeof(T);
  return r;
}
template<class...Ts, size_t...Is>
void print( const char* format, std::index_sequence<Is...>, buffer& b ) {
  std::tuple<Ts const*...> tup;
  using discard=int[];
  (void)discard{0,(
    std::get<Is>(tup) = get_arg<Ts>(b)
  ,void(),0)...};
  printf( format, (*std::get<Is>(tup))... );
}
template<class...Ts>
void print( const char* format, buffer& b ) {
  print(format, std::index_sequence_for<Ts...>{}, b)
}

The above, given a bundle of types <Ts...> and a buffer, will call printf( format, ts... ) where ts... are the data extracted from the buffer.

The next step is to extract the %[flags][width][.precision][length]specifier format commands one at a time. Take a substring containing only one of these commands, and feed it to the above.

Count how many * entries are in there, and based off that number ask for that many ints.

Finally, the length and specifier are mapped to a C++ type.

The technique required to map runtime values to compile time indexes (or C++ types) can be seen here among other spots.

This has the downside that upwards of 150 functions get generated.

As a side benefit, you can actually check that your buffer has enough data, and throw or exit if you run out instead of reading bad memory.

Upvotes: 0

geometrian
geometrian

Reputation: 15387

For a baseline answer, here is some simple code (reasonably well tested, but no guarantees) that implements the parse-the-format-string method I mentioned. I release it into the public domain.

If someone writes an answer that actually solves the problem I asked (doing this, but using va_list; i.e., a much cleaner solution) then I will accept that answer instead.

static void printf_buffer(char const*__restrict format_string, char*__restrict argument_buffer) {
    int num_chars = 0;
    PARSE_CHAR:
        switch (*format_string) {
            case '\0': return;
            case '%': {
                int i = 1;
                char c;
                PARSE_SPECIFIER:
                    c = format_string[i++];
                    switch (c) {
                        case 'd': case 'i':
                        case 'u': case 'o': case 'x': case 'X':
                        case 'f': case 'F': case 'e': case 'E': case 'g': case 'G': case 'a': case 'A':
                        case 'c': case 's': case 'p':
                            goto PRINT_SPECIFIER;
                        case 'n':
                            assert(i==2,"\"%%n\" must contain no intermediary characters!");
                            **reinterpret_cast<int**>(argument_buffer) = num_chars;
                            argument_buffer += sizeof(int*);
                            goto DONE_SPECIFIER;
                        case '%':
                            assert(i==2,"\"%%%%\" must contain no intermediary characters!");
                            putchar('%'); ++num_chars;
                            goto DONE_SPECIFIER;
                        case '\0': assert(false,"Expected specifier before end of string!");
                        default: goto PARSE_SPECIFIER;
                    }
                PRINT_SPECIFIER: {
                    char* temp = new char[i+1];
                    strncpy(temp,format_string,i); temp[i]='\0';
                    #define PRINTBRK(TYPE) num_chars+=printf(temp,*reinterpret_cast<TYPE*>(argument_buffer)); argument_buffer+=sizeof(TYPE); break;
                    switch (c) {
                        case 'd': case 'i': PRINTBRK(int)
                        case 'u': case 'o': case 'x': case 'X': PRINTBRK(unsigned int)
                        case 'f': case 'F': case 'e': case 'E': case 'g': case 'G': case 'a': case 'A': PRINTBRK(double)
                        case 'c': PRINTBRK(char)
                        case 's': PRINTBRK(char const*)
                        case 'p': PRINTBRK(void*)
                        default: assert(false,"Implementation error!");
                    }
                    #undef PRINTBRK
                    delete [] temp;
                }
                DONE_SPECIFIER:
                    format_string += i;
                    break;
            }
            default:
                putchar(*format_string); ++format_string; ++num_chars;
                break;
        }
        goto PARSE_CHAR;
}

Here is a link to the full source, including an enclosing test: link. Expected output:

double: 3.1400, float: +3.1400, getting characters: ->, percent: %, int:      12345, string: "Hello world!"
Printed 54 characters before the marked point:
                                                      <-

Upvotes: 1

Related Questions