Bruce Duncan
Bruce Duncan

Reputation: 269

unions in C to handle passing pointers of multiple type to function

I have some code where I want to have a generic function that takes pointers from main code and manipulates the variable at the pointer address. The problem is that the pointers are of type char, int and short. I want to be able to do it without flags or keeping track of what type of pointer is being passed etc. My guess was that a typedef union of pointers could be used and then the function would take an int pointer (being the largest data size of the three)

The below sort of works except with the char pointer. Is there a better way to do this?

       #include <stdio.h>

    void pointerfunction(int *p);
    int a=10;
    short b=20;
    char f=4;
    typedef union 
    {
    int *ptr1;
    short *ptr2;
    char *ptr3;
    }pointers;

    int main()
    {
    pointers mypointers;

    mypointers.ptr1=&a;
    pointerfunction(mypointers.ptr1);
    printf("%d\n", *(mypointers.ptr1));
    mypointers.ptr2=&b;
    pointerfunction(mypointers.ptr1);
    printf("%d\n", *(mypointers.ptr2));
    mypointers.ptr3=&f;
    pointerfunction(mypointers.ptr1);
    printf("%d\n", *(mypointers.ptr3));
    }


    void pointerfunction(int *p)
    {
    *p=*p*10;  
    }

Upvotes: 0

Views: 4083

Answers (6)

Richard Chambers
Richard Chambers

Reputation: 17693

Your idea of using a union is a good one however you are going to need an additional member to the union to indicate what kind of a pointer the union actually contains.

For practically all modern mainstream CPU architectures pointers on the same machine are the same size regardless as to whether they are an int *, char *, or void *. There may be odd machines such as the old segmented memory Intel 8086 with its near (16 bit offset pointers) and far (32 bit pointer composed of 16 bit segment with 16 bit offset) or other embedded processors where this may be different.

The reason the int and short work is because the compiler converts the int and short to be int so the printf() function basically sees both the same.

First of all I am going to describe a possible implementation. However this particular implementation is ugly and is not really the way to do this since it has a number of problems not the least that you are using a command switch and really reducing cohesion and increasing coupling.

Then I will explore a second approach using C11 _Generic() which provides some interesting possibilities.

First Attempt using union with type identifier

A first attempt would be something like the following.

#define  POINTER_UNION_TYPE_CHAR   1
#define  POINTER_UNION_TYPE_INT    2
#define  POINTER_UNION_TYPE_SHORT  3

typedef struct {
  int   iType;
  union {
    char *pChar;
    int  *pInt;
    short *pShort;
  } u;
} Pointers;

When you use this struct you would do something like:

int iValue = 1;
Pointers  PointerThing;

PointerThing.u.pInt = &iValue;  PointerThing.iType = POINTER_UNION_TYPE_INT;

Then in your function using this you would do something like:

void pointer_funct (Pointers *pPointers)
{
   switch (pPointers->iType) {
      case  POINTER_UNION_TYPE_CHAR:
           // do things with char pointer pPointers->u.pChar
           break;
      case  POINTER_UNION_TYPE_INT:
           // do things with char pointer pPointers->u.pInt
           break;
      case  POINTER_UNION_TYPE_SHORT:
           // do things with char pointer pPointers->u.pShort
           break;
      default:
           break;
    }
}

A better way to do this is to have separate functions that do what is all combined into a single function. So in other words, you would have three different functions that each will handle a particular pointer type. That way the functionality that knows what the type is can just go ahead and call the appropriate function.

Exploring C11 _Generic

C11 now has the _Generic specifier which allows you to specify a set of choices based on the type of a variable. See this SO post, https://stackoverflow.com/a/72475372/1466970

The problem is still that the pointer type stored in the union is determined at run time so any decisions about using the pointer is a run time decision and not a compile time decision.

The following is more modern C source code to use C11 _Generic.

Let's make a slight change to the pointers union.

// define the pointer union which contains the various pointer types along
// with an identifier to indicate what type of pointer is currently in the union.
typedef enum { POINTER_UNION_TYPE_UNKNOWN=0, POINTER_UNION_TYPE_CHAR=1, POINTER_UNION_TYPE_INT=2, POINTER_UNION_TYPE_SHORT=3} pointertype;

typedef struct
{
    pointertype id;    // indicates the type of pointer currently in the union.
    union {            // use an anonymous union so that we can say p.ptr rather than p.u.ptr
        void* ptr;         // this allows an assignment to a pointer of any type, e.g. int *a = t.ptr; char *c = t.ptr; etc.
        int* ptr1;
        short* ptr2;
        char* ptr3;
    };
}pointers;

Next let's define some helper functions along with several _Generic expressions in C Preprocessor macros to assist with: (1) creating the properly initialized pointer union, (2) test to see if a pointer type is actually in the union, (3) convert the value point to to an int regardless of the type of the pointer.

// functions used with the macro cbrtp() following to build
// a pointer struct with the correct value and identifier.
pointers cbrtint(int * p) {
    pointers j = { .id= POINTER_UNION_TYPE_INT, .ptr1=p };   // use designated initializers, C99 and later

    printf("cbrtint called: POINTER_UNION_TYPE_INT\n");

    return j;
}

pointers cbrtshort(short * p) {
    pointers j = { .id = POINTER_UNION_TYPE_SHORT, .ptr2 = p };   // use designated initializers, C99 and later

    printf("cbrtshort called: POINTER_UNION_TYPE_SHORT\n");

    return j;
}

pointers cbrtchar(char* p) {
    pointers j = { .id = POINTER_UNION_TYPE_CHAR, .ptr3 = p };   // use designated initializers, C99 and later

    printf("cbrtchar called: POINTER_UNION_TYPE_CHAR\n");

    return j;
}


#define cbrtp(X) _Generic((X),     \
              int *: cbrtint, \
                    char *: cbrtchar,  \
                    short *: cbrtshort  \
              )(X)

// ----------------------------------------------------------------
// functions used with the assertcbrtp() macro and its _Generic below
// to test that the pointer type matches the specified type

int assertcbrtpint(pointers va) {
    return va.id == POINTER_UNION_TYPE_INT;
}

int assertcbrtpchar(pointers va) {
    return va.id == POINTER_UNION_TYPE_CHAR;
}

int assertcbrtpshort(pointers va) {
    return va.id == POINTER_UNION_TYPE_SHORT;
}

#define assertcbrtp(ty,va)  _Generic((ty),  \
            int *: assertcbrtpint,  \
            char *: assertcbrtpchar, \
            short *: assertcbrtpshort \
        )(va)

// --------------------------------------------

int convertcbrtint(pointers va) {
    return *va.ptr1;
}

int convertcbrtchar(pointers va) {
    return *va.ptr3;
}

int convertcbrtshort(pointers va) {
    return *va.ptr2;
}

#define convertcbrt(ty,va) _Generic((ty),     \
              int *: convertcbrtint, \
                    char *: convertcbrtchar,  \
                    short *: convertcbrtshort  \
              )(va)

And finally do a few tests. Here is the test harness for exercising the above functions and macros.

Notice the code at the end which is #ifdefed out but will generate a compiler error if it is included. This compiler error would actually be a good way of finding some types of errors at compile time.

Also notice that we are using the convertcbrt() macro for conversion of the pointed to item to an int but also printing the value if we directly access the pointer through the int * ptr1 member of the union. You can see the difference in values in the output that follows.

    int a = 10;
    short b = 20;
    char f = 4;
    float q = 45.23f;

    pointers t;
    char* pointertypes[] = {
        "POINTER_UNION_TYPE_NONE",
        "POINTER_UNION_TYPE_CHAR",
        "POINTER_UNION_TYPE_INT",
        "POINTER_UNION_TYPE_SHORT"
    };


    int jy, jz;    // used to hold the assertcbrtp() type test results
    int kc;        // used for conversion to int of any pointer target.

    t = cbrtp(&a);     // create a pointers struct for a supported variable
    int* ap = t.ptr;
    kc = convertcbrt(ap, t);
    jy = assertcbrtp(ap, t);
    printf(" jy = %d  t.id = %d, %s    *ap = %d  kc = %d  *t.ptr1 = %d\n", jy, t.id, pointertypes[t.id], *ap, kc, *t.ptr1);

    t = cbrtp(&b);
    short* bp = t.ptr;
    kc = convertcbrt(bp, t);
    jy = assertcbrtp(bp, t);
    jz = assertcbrtp(ap, t);
    printf(" jy = %d jz = %d   t.id = %d, %s    *bp = %d  kc = %d  *t.ptr1 = %d\n", jy, jz, t.id, pointertypes[t.id], *bp, kc, *t.ptr1);

    t = cbrtp(&f);
    char* fp = t.ptr;
    kc = convertcbrt(fp, t);
    jy = assertcbrtp(fp, t);
    jz = assertcbrtp(ap, t) || assertcbrtp(bp, t);
    printf(" jy = %d jz = %d   t.id = %d, %s    *fp = %d  kc = %d  *t.ptr1 = %d\n", jy, jz, t.id, pointertypes[t.id], *fp, kc, *t.ptr1);

#if 0
    // following code using the float variable generates errors because the type is not part of the
    // _Generic(X) list.

    t = cbrtp(&q);       // generates compiler error of  "Error C7702   no compatible type for 'float *' in _Generic association list"

    float * qp = t.ptr;
    jy = assertcbrtp(qp, t);       // generates compiler error of  "Error   C7702   no compatible type for 'float *' in _Generic association list"
    jz = assertcbrtp(ap, t) || assertcbrtp(bp, t);
    printf(" jy = %d jz = %d   t.id = %d, %s    *qp = %f\n", jy, jz, t.id, pointertypes[t.id], *qp);
#endif

which produces the following output when using Visual Studio 2019. A couple of things to note.

First of all we can use the void * ptr member of the union to assign the pointer to typed pointer. C is polite enough to allow this while C++ does not and requires a cast.

Secondly, how we access the pointer in the union provides different values. See the difference between the value in kc versus if we dereference the int *ptr1 member for all three types.

cbrtint called: POINTER_UNION_TYPE_INT
 jy = 1  t.id = 2, POINTER_UNION_TYPE_INT    *ap = 10  kc = 10  *t.ptr1 = 10
cbrtshort called: POINTER_UNION_TYPE_SHORT
 jy = 1 jz = 0   t.id = 3, POINTER_UNION_TYPE_SHORT    *bp = 20  kc = 20  *t.ptr1 = -859045868
cbrtchar called: POINTER_UNION_TYPE_CHAR
 jy = 1 jz = 0   t.id = 1, POINTER_UNION_TYPE_CHAR    *fp = 4  kc = 4  *t.ptr1 = -858993660

You might finding adding an additional macro to perform assignments of the pointer value which if the type is incorrect, not the type of pointer stored in the union, the NULL pointer is returned. This is similar to the behavior of the dynamic_cast<> conversion. By using this you will know if the pointer in the union is the specific type of pointer you expect.

int * assigncbrtint(pointers va) {
    return (va.id == POINTER_UNION_TYPE_INT) ? va.ptr1 : NULL;
}

char * assigncbrtchar(pointers va) {
    return (va.id == POINTER_UNION_TYPE_CHAR) ? va.ptr3 : NULL;
}

short * assigncbrtshort(pointers va) {
    return (va.id == POINTER_UNION_TYPE_SHORT) ? va.ptr2 : NULL;
}

#define assigncbrt(ty,va) _Generic((ty),     \
              int *: assigncbrtint, \
                    char *: assigncbrtchar,  \
                    short *: assigncbrtshort  \
              )(va)

This can be used as in:

int *ap = assigncbrt(ap, t);
if (ap == NULL) printf("  assigncbrt(ap,t) failed.\n"); else printf("  assigncbrt(ap,t) succeeded.\n");

// the following line of code assumes the pointer contained in the
// union of struct t is in fact an int pointer. if it isn't then
// the attempt to dereference NULL will result in a NULL pointer exception.
int kc = *assigncbrt(&kc, t);   // assumes t contain an int * pointer

And finally, the type argument to the assigncbrt() macro, the first argument, doesn't need to be a typed variable. While you can't use a type declaration such as int * which causes an error of Error C2065 'iint': undeclared identifier with Visual Studio 2019, what you can do is use a cast with a constant such as (int *)NULL or (int*)1. This means a statement such as kckc = kc + *assigncbrt((int*)NULL, t) - 3; is allowed.

Another possible approach

Another approach to this is to use some object oriented techniques with this. See this post to another though similar question.

Upvotes: 5

jxh
jxh

Reputation: 70502

It seems you want something like a template function. However, C does not support template functions or function overloading.

Since your three types have different sizes, you can infer the type from the pointer. So you can use a macro to create the feel of an overloaded function.

#define pointerfunction(x) do { \
    switch (sizeof(*x)) { \
    case sizeof(int):   pointerfunction_int((void *)x);   break; \
    case sizeof(short): pointerfunction_short((void *)x); break; \
    case sizeof(char):  pointerfunction_char((void *)x);  break; \
    default:            fprintf(stderr, "unknown pointer type for %p\n", x); \
                        break; \
    } \
} while (0)

#define pointerfunction_template(T) \
    void pointerfunction_ ## T (T *x) { *x = *x * 10; }

pointerfunction_template(int);
pointerfunction_template(short);
pointerfunction_template(char);

Then, you can use the macro like this:

int a=10;
short b=20;
char f=4;

int main () {
    pointerfunction(&a);
    pointerfunction(&b);
    pointerfunction(&f);
    return 0;
}

This technique won't work generally, though. In particular, it fails if two types have the same size. Then, you would be forced to embed the type itself into your macro call.

#define pointerfunction_call(T, x) pointerfunction_ ## T(x)

pointerfunction_template(float);
pointerfunction_template(double);

float g = 2.2;
double h = 3.1;

pointerfunction_call(float, &g);
pointerfunction_call(double, &h);

Upvotes: 1

Kevin Vermeer
Kevin Vermeer

Reputation: 2852

As others have said, this is impossible in C. You are correct that int is the largest of the three types, but you seem to be missing the implications of this fact.

Why is this impossible in C?

In C, data is stored directly in memory with no meta-data overhead. A variable directly maps to data in memory. Unless you create it (violating your requirement that there be no flags or keeping track of what type of pointer is being passed), there is no information stored with the variable on things like:

  • what type it is
  • whether a variable has been initialized
  • whether a variable is in scope
  • or (for arrays/strings) the used length, or available size

as there is in other languages. Instead, this information should be maintained by the programmer, either by creating a struct to store this information or by asking the programmer to remember what's going on.

C is a systems programming language, and it's suitable for systems programming in part because it doesn't have this overhead like, say, Java or C# would.

OK, but why doesn't it work in a union?

What are the implications of the various sizes of the types being pointed to? Consider the following memory diagrams, where each character is 4 bits, an int is 32 bits, a short is 16 bits, and a char is 8 bits:

Nibbles:89ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF
[other ][data  ][int   ][int   ][int   ][mo][re][  ][da][ta]  // Ints
[other ][data  ][sh][or][t ][sh][or][t ][mo][re][  ][da][ta]  // Shorts
[other ][data  ][][][][][][][][][][][][][mo][re][  ][da][ta]  // Chars

Note that this is completely ignoring alignment and endianness issues; there are some platforms (including ARM, which I see in some of your other questions) where certain guarantees are made about alignment that could help you.(†)

However, the problem still remains for static memory or memory on the heap. Consider what would happen if you stored the string ABCDEFGHIJKL in your character array. Remembering that an ASCII A is 0x41, that would become the following in memory:

[other ][data  ]4142434445464748494A4B4C[mo][re][  ][da][ta]

Now imagine that you passed a pointer to C to your function which dereferences this as an integer:

                    [int   ]                                  // Int pointer to C
[other ][data  ][][][][][][][][][][][][][mo][re][  ][da][ta]  // Chars
[other ][data  ]4142434445464748494A4B4C[mo][re][  ][da][ta]
                    ^-- C is here; 0x43

Using an int pointer here will violate the C specification.

If that's not enough, and we assume your compiler behaves logically, it will attempt to dereference memory across a word boundary, which can throw a bus fault or usage fault (I forget what it actually does on ARMv7, but either one of those faults will terminate your program).

If that's still not enough, and it somehow does what's asked of it, the operation will produce a wrong answer, because you're working with the value 0x43444546 and not 0x43.


Some footnotes about memory alignment on ARM processors

(†) On ARM, for example, the ABI specifies that the stack must be word-aligned in normal use (sp % 4 == 0), in which case your code might work, as the diagram would look like this:

0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF
[other ][data  ][int   ][int   ][int   ][mo]    [re]    [  ]    [da]    [ta]
[other ][data  ][sh]    [or]    [t ]    [sh]    [or]    [t ]    [mo]    [re] ...
[other ][data  ][]      []      []      []      []      []      []      []   ...

The stack is also guaranteed to be doubleword aligned for public interfaces, and internally it doesn't have to be maintained, see 5.2.1 in the AAPCS for details. Nevertheless, this isn't something you want to rely on (portable code is preferable in most cases) or should even need to know unless you're writing a compiler or raw assembly code

Upvotes: 2

John Bode
John Bode

Reputation: 123598

C cannot do "generic" functions, at least not in the "do what I mean" sense that you get with C++ templates and function overloading:

template <typename T>
void pointerfunction(T *p)
{
  *p = *p * 10;
}
...
pointerfunction(&a);
pointerfunction(&b);
pointerfunction(&c);

At some point, your function has to know what the correct type of the pointer is in order to function properly; casting the pointer type to int * will cause problems with non-integral types.

One thing you can do is write a different function to handle each type of pointer, but call those functions through a common "generic" interface, like so:

void intfunc(void *p)   { int   *lp = *p; *lp *= 10; }
void charfunc(void *p)  { char  *lp = *p; *lp *= 10; }
void shortfunc(void *p) { short *lp = *p; *lp *= 10; }

void callfunc(void *data, void (*ptrfunc)(void *))
{
  ptrfunc(data);
}
...
callfunc(&a, intfunc);
callfunc(&b, shortfunc);
callfunc(&c, charfunc);

It isn't pretty, and unlike C++ templates it throws any pretense of type safety out the window (as soon as you start mucking with void *, you've lost any support from the compiler).

But...

This approach is pretty flexible, and you don't have to do anything too unnatural to get it to work.

Upvotes: 0

themel
themel

Reputation: 8895

This is a bad idea. See here:

void pointerfunction(int *p);
short b=20;
short c=20;
typedef union
{
  int *ptr1;
  short *ptr2;
  char *ptr3;
} pointers;

int main()
{
  pointers mypointers;

  mypointers.ptr2=&b;
  pointerfunction(mypointers.ptr2);
  printf("b=%d,c=%d\n", b,c);

  mypointers.ptr2=&c;
  pointerfunction(mypointers.ptr2);
  printf("b=%d,c=%d\n", b,c);

}


void pointerfunction(int *p)
{
  *p=*p*10;
}

which, on my system, prints:

b=200,c=200
b=200,c=2000

Is that what you wanted to happen?

Upvotes: 1

Joe
Joe

Reputation: 47739

This can't work. You have no way of knowing the type of the pointer you are storing in the union.

You can define a struct which has the union as a member, and another field that says what kind of value you are using. Remember that the size of a pointer is the same no matter what it points to. I have not tested this, but you get the idea.

#include <stdio.h>

int a=10;
short b=20;
theChar f=4;

#define INT 1
#define SHORT 2
#define CHAR 4

typedef union 
{
int *ptr1;
short *ptr2;
char *ptr3;
}pointers;

typedef struct {
    pointers ps;
    type int;
} myStruct;


void pointerfunction(myStruct theStruct)
{
    switch (theStruct.type) {
        case INT:
            *(theStruct.ptr1) += 10;
            break;
        case SHORT:
            *(theStruct.ptr2) += 20;
            break;
        case CHAR:
            *(theStruct.ptr3) += 30;
            break;
    }
}


int main()
{
    // Example with a char
    myStruct theStruct;
    theStruct.type = CHAR;

    theStruct.ptr3=&theChar;

    pointerfunction(theStruct);
    printf("%d\n", *(theStruct.pointers.ptr3));
}

Upvotes: 0

Related Questions