Reputation: 15387
I am working on a cycle-accurate simulator for a research architecture. I already have a cross-compiler that generates assembly (based on MIPS). For debug purposes, we have a printf
intrinsic which ultimately, when run in the simulator, calls a builtin method that has access to a list of arguments packed in a contiguous array (such as would be created by this code):
template <typename type> inline static void insert(char buffer[], size_t* i, type value) {
memcpy(buffer+*i,&value, sizeof(type)); *i+=sizeof(type);
}
int main(int /*argc*/, char* /*argv*/[]) {
char buffer[512]; size_t i=0;
insert<double>(buffer,&i, 3.14);
insert<int>(buffer,&i, 12345);
insert<char const*>(buffer,&i, "Hello world!");
return 0;
}
In MSVC, one can then create a va_list
and call vprintf
like so:
union { va_list list; char* arguments; } un;
un.arguments = buffer;
vprintf(format_string, un.list);
The target architecture is x86-64, which is based on x86, so this produces apparently correct results (the va_list
provided by MSVC is just a typedef for char*
).
However, on g++ (and presumably Clang; I haven't tried), the code segfaults. This happens because the underlying type (it's compiler-provided: in gcc 4.9.2, it appears to be typedefed from __gnuc_va_list
, which is in turn typedefed from __builtin_va_list
, presumably a compiler intrinsic) is different (as the compiler error you get it you just go un.list=buffer;
forbodes).
My question is: what is the cleanest way to convert this array of packed arguments into a va_list
that is usable by both g++ and Clang in x86-64 mode?
My current thinking is that it may be better to parse out each format specifier individually, then forward it off with the appropriate argument to printf
. This isn't as robust (in the sense of supporting all features of printf
; working on a single architecture only is robust enough for our purposes), nor is it particularly compelling, though.
Upvotes: 3
Views: 732
Reputation: 275385
struct buffer {
const char* ptr = 0;
size_t count = 0;
};
template<class T>
T const* get_arg( buffer& b ) {
T const* r = reinterpret_cast<T const*>(b.ptr);
b.ptr += sizeof(T);
b.count -= sizeof(T);
return r;
}
template<class...Ts, size_t...Is>
void print( const char* format, std::index_sequence<Is...>, buffer& b ) {
std::tuple<Ts const*...> tup;
using discard=int[];
(void)discard{0,(
std::get<Is>(tup) = get_arg<Ts>(b)
,void(),0)...};
printf( format, (*std::get<Is>(tup))... );
}
template<class...Ts>
void print( const char* format, buffer& b ) {
print(format, std::index_sequence_for<Ts...>{}, b)
}
The above, given a bundle of types <Ts...>
and a buffer
, will call printf( format, ts... )
where ts...
are the data extracted from the buffer
.
The next step is to extract the %[flags][width][.precision][length]specifier
format commands one at a time. Take a substring containing only one of these commands, and feed it to the above.
Count how many *
entries are in there, and based off that number ask for that many int
s.
Finally, the length and specifier are mapped to a C++ type.
The technique required to map runtime values to compile time indexes (or C++ types) can be seen here among other spots.
This has the downside that upwards of 150 functions get generated.
As a side benefit, you can actually check that your buffer has enough data, and throw or exit if you run out instead of reading bad memory.
Upvotes: 0
Reputation: 15387
For a baseline answer, here is some simple code (reasonably well tested, but no guarantees) that implements the parse-the-format-string method I mentioned. I release it into the public domain.
If someone writes an answer that actually solves the problem I asked (doing this, but using va_list
; i.e., a much cleaner solution) then I will accept that answer instead.
static void printf_buffer(char const*__restrict format_string, char*__restrict argument_buffer) {
int num_chars = 0;
PARSE_CHAR:
switch (*format_string) {
case '\0': return;
case '%': {
int i = 1;
char c;
PARSE_SPECIFIER:
c = format_string[i++];
switch (c) {
case 'd': case 'i':
case 'u': case 'o': case 'x': case 'X':
case 'f': case 'F': case 'e': case 'E': case 'g': case 'G': case 'a': case 'A':
case 'c': case 's': case 'p':
goto PRINT_SPECIFIER;
case 'n':
assert(i==2,"\"%%n\" must contain no intermediary characters!");
**reinterpret_cast<int**>(argument_buffer) = num_chars;
argument_buffer += sizeof(int*);
goto DONE_SPECIFIER;
case '%':
assert(i==2,"\"%%%%\" must contain no intermediary characters!");
putchar('%'); ++num_chars;
goto DONE_SPECIFIER;
case '\0': assert(false,"Expected specifier before end of string!");
default: goto PARSE_SPECIFIER;
}
PRINT_SPECIFIER: {
char* temp = new char[i+1];
strncpy(temp,format_string,i); temp[i]='\0';
#define PRINTBRK(TYPE) num_chars+=printf(temp,*reinterpret_cast<TYPE*>(argument_buffer)); argument_buffer+=sizeof(TYPE); break;
switch (c) {
case 'd': case 'i': PRINTBRK(int)
case 'u': case 'o': case 'x': case 'X': PRINTBRK(unsigned int)
case 'f': case 'F': case 'e': case 'E': case 'g': case 'G': case 'a': case 'A': PRINTBRK(double)
case 'c': PRINTBRK(char)
case 's': PRINTBRK(char const*)
case 'p': PRINTBRK(void*)
default: assert(false,"Implementation error!");
}
#undef PRINTBRK
delete [] temp;
}
DONE_SPECIFIER:
format_string += i;
break;
}
default:
putchar(*format_string); ++format_string; ++num_chars;
break;
}
goto PARSE_CHAR;
}
Here is a link to the full source, including an enclosing test: link. Expected output:
double: 3.1400, float: +3.1400, getting characters: ->, percent: %, int: 12345, string: "Hello world!"
Printed 54 characters before the marked point:
<-
Upvotes: 1