Reputation: 19

Convert String to Type in C language

Prototype: size_t offsetof(type, member);

I know first parameter is type, what if I have just the name in string not the type. I want to get the offsetof a member with just string literals

I want help from the community, how to achieve this.

ex:

#include <stdio.h>
#include <stddef.h>

typedef struct example_ {
    void *member1;
    void *member2;
} example;

unsigned int
offset_gen(char *ds, char *member)
{
    return (offsetof(ds, member));
}

void
main()
{
    printf ("\n %d", offset_gen("example", "member1"));
    printf ("\n %d", offset_gen("example", "member2"));
}

Upvotes: 0

Answers (2)

skrtbhtngr

Reputation: 2251

I think you are using offsetof in the wrong way. You have to pass the structure type and the member names to the macro, and not strings containing their names.

For instance, if your struct is:

typedef struct example_ {
    void *member1;
    void *member2;
}example;

Then, you can calculate the offset of member1 as:

offsetof(example, member1)

But if you still want to use string literals as you have, then you have to compare the member parameter in your offset_gen with the struct member names manually and call the corresponding macro.

Example:

unsigned int
offset_gen(char *ds, char *member)
{
    if(!strcmp(ds,"example"))
    {
        if(!strcmp(member,"member1"))
            return (offsetof(example, member1));
        else if(!strcmp(member,"member2"))
            return (offsetof(example, member2));
    }
    return -1;       // if no match for input paramters is found
}

You can even try something like this, or even this (if you want to test your limits!).

Upvotes: 1

Nominal Animal

Reputation: 39426

Here is a real-world example on how one could begin to implement this.

Note: This is not application-ready code. I wrote this from scratch, and as such, should only be considered as a proof-of-concept; something one would use as a basis for a discussion in a development team. This version does not use a proper C parser, but assumes certain conventions used in the C source.

All of the files included in this post are licensed under CC0, i.e. dedicated to public domain. Remember, however, that there are no guarantees: if it breaks or breaks something else, don't blame me.

Essentially, we use a Bash+Awk script to generate a C program, that when compiled and run, generates a hash table with precalculated data, and a member_offset() function one can use to find member offsets of structure types, with structure type and member name given as strings.

For illustration, this is a complete working example, including a Makefile.

File mytypes.h contains the types we are interested in:

#include <stdlib.h>

struct type1 {
    char         one, two[2];
    float        three;
    int        (*callback)(const char *, void *, size_t);
} __attribute__((__packed__));

struct type2 {
    char         four;
    struct type1 five;
    int          six, seven[3];
};

You don't need to stuff the types into a single header file; you only need to edit the Makefile if you have them in different files. One requirement, however, is that all types are included in header files, that can be #include'd in the intermediate C generator file, compiled and run at build time only.

For illustration, we have a main.c that lets user specify struct type and member name on the command line, with the offset printed to standard output:

#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>

extern size_t member_offset(const char *type, const char *name, const size_t not_found);

int main(int argc, char *argv[])
{
    int    arg;
    size_t offset;

    if (argc < 3 || !(argc & 1) || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
        fprintf(stderr, "\n");
        fprintf(stderr, "Usage: %s [ -h | --help ]\n", argv[0]);
        fprintf(stderr, "       %s TYPE NAME [ TYPE NAME ... ]\n", argv[0]);
        fprintf(stderr, "\n");
        return EXIT_SUCCESS;
    }

    for (arg = 1; arg < argc - 1; arg += 2) {
        offset = member_offset(argv[arg], argv[arg + 1], ~(size_t)0);
        if (errno) {
            fprintf(stderr, "struct %s unknown, or has no member %s.\n", argv[arg], argv[arg + 1]);
            return EXIT_FAILURE;
        }

        printf("struct %s has member %s at offset %zu.\n", argv[arg], argv[arg + 1], offset);
        fflush(stdout);
    }

    return EXIT_SUCCESS;
}

To build the project, we use a Makefile. Note that the indents are Tabs, not spaces; make is picky that way.

CC      := gcc
CFLAGS  := -Wall -O2
LDFLAGS :=

.PHONY: all clean

all: clean example

clean:
    rm -f *.o example member-offset.c member-offset-generator.c member-offset-generator

member-offset.c: mytypes.h
    rm -f $@ member-offset-generator member-offset-generator.c
    ./member-offset-generator.bash mytypes.h:type1 mytypes.h:type2 > member-offset-generator.c
    $(CC) $(CFLAGS) member-offset-generator.c $(LDFLAGS) -o member-offset-generator
    ./member-offset-generator > $@
    rm -f member-offset-generator member-offset-generator.c

%.o: %.c
    $(CC) $(CFLAGS) -c $^

example: member-offset.o main.c
    $(CC) $(CFLAGS) $^ $(LDFLAGS) -o $@

Note the member-offset.c rule above. It refers to the autogenerated C source file, that will contain the member_offset() function. It is recompiled if it does not exist yet, and also whenever mytypes.h is modified.

The command ./member-offset-generator.bash mytypes.h:type1 mytypes.h:type2 > member-offset-generator.c uses the fourth file not shown yet (see further below), to examine mytypes.h, and include struct type1 and struct type2 in the type database hash tables. The output is member-offset-generator.c, a C program that when compiled and run, generates the C code we actually want. It might be better to split this rule into separate rules, but for now, I made it automatically compile and run member-offset-generator.c, and delete it (as it is only needed to output member-offset.c once).

The shell script that generates that intermediate C program, member-offset-generator.bash, is pretty complicated:

#!/bin/bash
export LANG=C LC_ALL=C

[ -n "$CC"     ] || export CC="gcc"
[ -n "$CFLAGS" ] || export CFLAGS="-Wall -O2"

if [ $# -lt 1 ] || [ "$1" = "-h" ] || [ "$1" = "--help" ]; then
    exec >&2
    printf '\n'
    printf 'Usage: %s [ -h | --help ]\n' "$0"
    printf '       %s HEADER[:TYPE] ...\n' "$0"
    printf '\n'
    printf 'This script autogenerates a C program, that when run,\n'
    printf 'emits a C implementation of function member_offset()\n'
    printf 'which returns the offset of "member" within type "struct type".\n'
    printf '\n'
    printf 'The generated C program includes all HEADER files,\n'
    printf 'but each one only once. Only the specified struct types\n'
    printf 'will be supported by the final function.\n'
    printf '\n'
    exit 1
fi

function hash_of_function() {
    sed -e 's|        ||' << END
        /* DJB2 xor hash, http://www.cse.yorku.ca/~oz/hash.html */
        size_t hash_of(const void *data, const size_t size)
        {
            const unsigned char       *p = (const unsigned char *)data;
            const unsigned char *const q = (const unsigned char *)data + size;
            size_t                     h = 5381;
            while (p < q)
                h = ((h << 5) + h) ^ (*(p++));
            return h;
        }
END
}

# Emit all headers as includes, but each one only once.
printf '%s\n' "$@" | awk \
   'BEGIN {
        RS="\n"
        FS=":"
        split("", seen)

        printf "#include <stdlib.h>\n"
        printf "#include <stddef.h>\n"
        printf "#include <string.h>\n"
        printf "#include <stdio.h>\n"
        seen["stdlib.h"] = 1
        seen["stddef.h"] = 1
        seen["string.h"] = 1
        seen["stdio.h"] = 1
    }
    {
        header = $1
        sub(/^[<"]/, "", header)
        sub(/[>"]$/, "", header)
        if (length(header) > 0 && !(header in seen)) {
            seen[header] = 1
            if (substr($1, 1, 1) == "<")
                printf "#include <%s>\n", header
            else
                printf "#include \"%s\"\n", header
        }
    }'

# emit the hash function as a string.
printf '\nstatic const char hash_of_def[] =\n'
hash_of_function | sed -e 's|\\|\\\\|g; s|"|\\"|g; s|^|    "|g; s|[\t\v\f ]*$|\\n"|g'
printf '    ;\n\n'
# and the hash function itself.
hash_of_function

# emit structures and code used by the generator itself.
sed -e 's|^    ||' <<END

    struct type_member_list {
        struct type_member_list *next;
        size_t                   offset;
        size_t                   hash;
        size_t                   namelen;
        char                     name[];
    };

    struct type_list {
        struct type_list        *next;
        struct type_member_list *members;
        size_t                   hash;
        size_t                   slots;
        size_t                   typelen;
        char                     type[];
    };

    static size_t type_list_size(const struct type_list *list)
    {
        size_t result = 0;
        while (list) {
            ++result;
            list = list->next;
        }
        return result;
    }

    static size_t type_member_list_size(const struct type_member_list *list)
    {
        size_t result = 0;
        while (list) {
            ++result;
            list = list->next;
        }
        return result;
    }


    static struct type_list *types = NULL;


    static void add_type_member(const char *type, const char *name, const size_t offset)
    {
        const size_t typelen = (type) ? strlen(type) : 0;
        const size_t namelen = (name) ? strlen(name) : 0;

        struct type_list        *list = NULL, *temp;
        struct type_member_list *member;

        if (!typelen || !namelen) {
            if (!typelen)
                fprintf(stderr, "Error: add_type_member() called with empty type.\n");
            if (!namelen)
                fprintf(stderr, "Error: add_type_member() called with empty name.\n");
            exit(EXIT_FAILURE);
        }

        /* Find the list for the specified type. */
        for (temp = types; temp != NULL; temp = temp->next)
            if (temp->typelen == typelen && !strcmp(temp->type, type)) {
                list = temp;
                break;
            } 

        /* If this is a new type, create a new list. */
        if (!list) {
            list = malloc(sizeof (struct type_list) + typelen + 1);
            if (!list) {
                fprintf(stderr, "Error: Out of memory.\n");
                exit(EXIT_FAILURE);
            }
            memcpy(list->type, type, typelen);
            list->type[typelen] = '\0';
            list->typelen = typelen;
            list->hash = hash_of(type, typelen);
            list->slots = 0;
            list->members = NULL;

            /* Prepend to global types list. */
            list->next = types;
            types = list;
        }

        /* Create a new member. */
        member = malloc(sizeof (struct type_member_list) + namelen + 1);
        if (!member) {
            fprintf(stderr, "Error: Out of memory.\n");
            exit(EXIT_FAILURE);
        }
        memcpy(member->name, name, namelen);
        member->name[namelen] = '\0';
        member->namelen = namelen;
        member->hash = hash_of(name, namelen);
        member->offset = offset;

        /* Prepend to member list. */
        member->next = list->members;
        list->members = member;
    }

    void add_types_and_members(void)
    {
END

ignorefirst=$'<"'
ignorelast=$'>"'

# Extract the member names from each structure.
for pair in "$@"; do
    name="${pair#*:}"
    [ ":$name" = ":$pair" ] && continue
    [ -n "$name" ] || continue

    file="${pair%%:*}"
    file="${file#[$ignorefirst]}"
    file="${file%[$ignorelast]}"

    $CC $CFLAGS -P -E "$file" | \
    sed -e '/#/ d' | tr -s '\t\n\v\f\r ' '      ' | \
    sed -e 's|\(struct [^ ]*\) {|\n\1 {\n|g; s|}|\n}\n|g; s| *;|\n|g; s|)([^)]*)||g' | \
    awk -v name="$name" \
   'BEGIN {
        RS = " *\n"
        FS = " *,"
        split("", members)
    }

    $0 == ("struct " name " {") {
        inside = 1
        next
    }

    $0 == "}" {
        inside = 0
        next
    }

    inside {
        for (i = 1; i <= NF; i++) {
            member = $i
            sub(/\[[^\[\]]*\]/, "", member)
            sub(/^.*[ \*(]/, "", member)
            if (!(member in members))
                members[member] = member
        }
    }

    END {
        for (member in members)
            printf "    add_type_member(\"%s\", \"%s\", offsetof(struct %s, %s));\n", name, member, name, member
    }' || exit 1
done

# emit the rest of the generator code.
sed -e 's|^    ||' <<END
    }

    size_t type_slots(struct type_list *list)
    {
        const size_t  size = type_list_size(list);
        const size_t  max_slots = 4 * size + 1;
        size_t        slots = size;
        size_t       *used, i, n;

        struct type_list *item;

        used = malloc(max_slots * sizeof used[0]);
        if (!used) {
            fprintf(stderr, "Error: Out of memory.\n");
            exit(EXIT_FAILURE);
        }

        while (1) {
            if (slots >= max_slots) {
                fprintf(stderr, "Error: Weak hash function; hash table grows too large.\n");
                fprintf(stderr, "       (Need more than %zu slots for %zu data entries.)\n", max_slots, size);
                exit(EXIT_FAILURE);
            }

            for (i = 0; i < slots; i++)
                used[i] = 0;

            for (item = list; item != NULL; item = item->next)
                ++used[item->hash % slots];

            n = used[0];
            for (i = 1; i < slots; i++)
                if (used[i] > n)
                    n = used[i];

            if (n <= 1) {
                free(used);
                return slots;
            }

            slots++;
        }
    }


    size_t generate_type(const char *type, struct type_member_list *list, const size_t size)
    {
        /* Maximum size for current hash table. */
        const size_t  max_slots = 4*size + 1;
        size_t        slots = size;
        size_t       *used, i, n;

        struct type_member_list *item;

        if (size < 1)
            return 0;

        used = malloc(max_slots * sizeof used[0]);
        if (!used) {
            fprintf(stderr, "Error: Out of memory.\n");
            exit(EXIT_FAILURE);
        }

        while (1) {

            if (slots >= max_slots) {
                fprintf(stderr, "Error: Weak hash function; hash table grows too large.\n");
                fprintf(stderr, "       (Need more than %zu slots for %zu data entries.)\n", max_slots, size);
                exit(EXIT_FAILURE);
            }

            /* Clear slot use counts. */
            for (i = 0; i < slots; i++)
                used[i] = 0;

            /* Count slot occupancies. */
            for (item = list; item != NULL; item = item->next)
                ++used[item->hash % slots];

            /* Find the maximum slot occupancy. */
            n = used[0];
            for (i = 1; i < slots; i++)
                if (used[i] > n)
                    n = used[i];

            /* Suitable size? */
            if (n <= 1)
                break;

            /* Try a larger hash table, then. */
            slots++;
        }

        free(used);

        /* Print out the contents of this hash table. */
        printf("static const struct member  struct_%s_members[%zu] = {\n", type, slots);
        for (i = 0; i < slots; i++) {
            for (item = list; item != NULL; item = item->next)
                if (item->hash % slots == i)
                    break;
            if (item) {
                printf("    { .offset  = %zu,\n", item->offset);
                printf("      .hash    = %zu,\n", item->hash);
                printf("      .namelen = %zu,\n", item->namelen);
                printf("      .name    = \"%s\" },\n", item->name);
            } else {
                printf("    { .offset  = 0,\n");
                printf("      .hash    = 0,\n");
                printf("      .namelen = 0,\n");
                printf("      .name    = NULL },\n");
            }
        }
        printf("};\n\n");

        return slots;
    }

    int main(void)
    {
        struct type_list *list;
        size_t            main_slots, i;

        add_types_and_members();

        printf("#include <stdlib.h>\n");
        printf("#include <string.h>\n");
        printf("#include <errno.h>\n");
        printf("\n");
        printf("struct member {\n");
        printf("    const size_t      offset;\n");
        printf("    const size_t      hash;\n");
        printf("    const size_t      namelen;\n");
        printf("    const char *const name;\n");
        printf("};\n");
        printf("\n");
        printf("struct type {\n");
        printf("    const size_t               hash;\n");
        printf("    const size_t               namelen;\n");
        printf("    const size_t               members;\n");
        printf("    const struct member *const member;\n");
        printf("    const char *const          name;\n");
        printf("};\n");
        printf("\n");
        printf("%s\n", hash_of_def);
        printf("\n");

        for (list = types; list != NULL; list = list->next)
            list->slots = generate_type(list->type, list->members, type_member_list_size(list->members));

        main_slots = type_slots(types);

        printf("static const size_t       num_types = %zu;\n", main_slots);
        printf("static const struct type  types[%zu] = {\n", main_slots);
        for (i = 0; i < main_slots; i++) {
            for (list = types; list != NULL; list = list->next)
                if (list->hash % main_slots == i)
                    break;

            if (list) {
                printf("    { .hash    = %zuUL,\n", list->hash);
                printf("      .namelen = %zu,\n", list->typelen);
                printf("      .members = %zu,\n", list->slots);
                printf("      .member  = struct_%s_members,\n", list->type);
                printf("      .name    = \"%s\" },\n", list->type);
            } else {
                printf("    { .hash    = 0,\n");
                printf("      .namelen = 0,\n");
                printf("      .members = 0,\n");
                printf("      .member  = NULL,\n");
                printf("      .name    = NULL },\n");
            }
        }
        printf("};\n");
        printf("\n");
        printf("size_t member_offset(const char *type, const char *name, const size_t not_found)\n");
        printf("{\n");
        printf("    const size_t  typelen = (type) ? strlen(type) : 0;\n");
        printf("    const size_t  namelen = (name) ? strlen(name) : 0;\n");
        printf("\n");
        printf("    if (typelen > 0 && namelen > 0) {\n");
        printf("        const size_t  typehash = hash_of(type, typelen);\n");
        printf("        const size_t  t = typehash %% num_types;\n");
        printf("        if (types[t].hash == typehash &&\n");
        printf("            types[t].namelen == typelen &&\n");
        printf("            !strcmp(types[t].name, type)) {\n");
        printf("            const size_t         namehash = hash_of(name, namelen);\n");
        printf("            const struct member *const member = types[t].member + (namehash %% types[t].members);\n");
        printf("            if (member->hash == namehash &&\n");
        printf("                member->namelen == namelen &&\n");
        printf("                !strcmp(member->name, name)) {\n");
        printf("                errno = 0;\n");
        printf("                return member->offset;\n");
        printf("            }\n");
        printf("        }\n");
        printf("    }\n");
        printf("    errno = ENOENT;\n");
        printf("    return not_found;\n");
        printf("}\n\n");

        return EXIT_SUCCESS;
    }
END

This version uses djb2 xor hash function. If you use some other one, write it in C after the sed ... <<END line, ending with END at the start of the line, in the hash_of_function Bash function. (The sed is there just to remove eight spaces of indentation, making the script slightly easier to read.) It is fast, and simple. Whether it suffices for any real world use cases, I don't know; for some test header files I threw at it, it worked just fine.

Both the known structure types, and the members of each known structure type, are stored in hash tables. Since the entries are small, and this is done for performance gains, the hash tables have at most one entry per hash table slot, with some empty slots. This means at most two probes (one probe per tble) per lookup. The intermediate C program searches for the smallest size (number of slots) that puts at most one type or member per entry, so that simple arrays can be used. This yields constant time ($O(1)$) complexity for the hash table search. Because we do need to calculate the hashes from the two supplied strings, technically the time complexity depends on their lengths. Which means, you do need to use a fast hash function; the hash function does not need to be perfect or cryptographically secure.

The one probe to each hash table first compares the hash, then the string length, and finally the string itself, to ensure no false matches. This means that when a match is found, exactly two strcmp()s are made.

If you know the function will never be called to find the offset of a non-existent member, or with a non-existent type, you can safely omit the strcmp() checks.

You can examine the generated intermediate program by running

./member-offset-generator.bash mytypes.h:type1 mytypes.h:type2 | less

As you probably have noticed at this point, writing a C program that generates C code is .. complicated; and that writing a script that generates a C program that generates C code is .. typically not worth the maintenance effort. However, it is definitely doable, although there is a high risk that maintaining the script requires more effort than the generated code is worth. Be aware of this risk.

The default action in the Makefile (when you run make) is the same as make clean example. If you save all the above to their respective files, and then run

make

you should see something like

rm -f *.o example member-offset.c member-offset-generator.c member-offset-generator
rm -f member-offset.c member-offset-generator member-offset-generator.c
./member-offset-generator.bash mytypes.h:type1 mytypes.h:type2 > member-offset-generator.c
gcc -Wall -O2 member-offset-generator.c  -o member-offset-generator
./member-offset-generator > member-offset.c
rm -f member-offset-generator member-offset-generator.c
gcc -Wall -O2 -c member-offset.c
gcc member-offset.o main.c  -o example

because make outputs the commands it runs, and I didn't hide any of them (by prepending the corresponding command with @).

Then, if you run

./example type1 one  type1 two  type1 three  type1 callback

the example program should output

struct type1 has member one at offset 0.
struct type1 has member two at offset 1.
struct type1 has member three at offset 3.
struct type1 has member callback at offset 7.

On x86-64, which is an LP64 architecture (int being 32-bit, and long and pointers 64-bit), running

./example type2 four type2 five type2 six type2 seven

outputs

struct type2 has member four at offset 0.
struct type2 has member five at offset 1.
struct type2 has member six at offset 16.
struct type2 has member seven at offset 20.

On x86-64, one can compile 32-bit code by using the -m32 GCC option. So, running

make CFLAGS="-Wall -O2 -m32" clean all

and then

./example type2 four type2 five type2 six type2 seven

outputs

struct type2 has member four at offset 0.
struct type2 has member five at offset 1.
struct type2 has member six at offset 12.
struct type2 has member seven at offset 16.

This can be extended to allow for some kind of introspection, if we add support for the types of the structure members in the hash table entries.

However, I cannot stress enough how important it is to consider the maintenance efforts needed to keep this working. If the codebase has a strict set of coding standards, and someone knows this code-generator-generator well enough to regularly check it parses the structures correctly, and more than one developer can maintain it long-term, then sure; I don't see why not use something like this. Otherwise, it may become a heavy burden, that may pull down the rest of the project with it. Especially if there is just one developer who has sufficient knowledge to maintain the code-generator-generator, and they happen to leave. No project should be dependent on a specific person, in my opinion.

If you have any specific questions, feel free to ask them in comments, and I'll try to explain. However, I will not explain the entire member-offset-generator.bash script line-by-line (as I have occasionally done in the past for other examples I've written), because at 435 lines, with its inherent inception-like (C code output by a C program created by the script) complexity, it is not worth the effort to anyone.

Upvotes: 2

Convert String to Type in C language

Answers (2)

Related Questions