Antonio Pavan
Antonio Pavan

Reputation: 55

Malloc function in dynamic lists

I'm getting started with dynamic lists and i don't understand why it is necessary to use the malloc function even when declaring the first node in the main() program, the piece of code below should just print the data contained in the first node but if i don't initialize the node with the malloc function it just doesn't work:

 struct node{
    int data;
    struct node* next;
};

void insert(int val, struct node*);

int main() {
    struct node* head ;
    head->data = 2;
    printf("%d \n", head->data);

}

Upvotes: 1

Views: 967

Answers (4)

yano
yano

Reputation: 5265

What you have invokes undefined behavior because you don't really have a node,, you have a pointer to a node that doesn't actually point to a node. Using malloc and friends creates a memory region where an actual node object can reside, and where a node pointer can point to.

In your code, struct node* head is a pointer that points to nowhere, and dereferencing it as you have done is undefined behavior (which can commonly cause a segfault). You must point head to a valid struct node before you can safely dereference it. One way is like this:

int main() {
    struct node* head;
    struct node myNode;
    head = &myNode;  // assigning the address of myNode to head, now head points somewhere
    head->data = 2;  // this is legal
    printf("%d \n", head->data);  // will print 2

}

But in the above example, myNode is a local variable, and will go out of scope as soon as the function exists (in this case main). As you say in your question, for linked lists you generally want to malloc the data so it can be used outside of the current scope.

int main() {
    struct node* head = malloc(sizeof struct node);
    if (head != NULL)
    {
        // we received a valid memory block, so we can safely dereference
        // you should ALWAYS initialize/assign memory when you allocate it.
        // malloc does not do this, but calloc does (initializes it to 0) if you want to use that
        // you can use malloc and memset together.. in this case there's just
        // two fields, so we can initialize via assignment.
        head->data = 2;
        head->next = NULL;
        printf("%d \n", head->data);

        // clean up memory when we're done using it
        free(head);
    }
    else
    {
        // we were unable to obtain memory
        fprintf(stderr, "Unable to allocate memory!\n");
    }
    return 0;
}

This is a very simple example. Normally for a linked list, you'll have insert function(s) (where the mallocing generally takes place and remove function(s) (where the freeing generally takes place. You'll at least have a head pointer that always points to the first item in the list, and for a double-linked list you'll want a tail pointer as well. There can also be print functions, deleteEntireList functions, etc. But one way or another, you must allocate space for an actual object. malloc is a way to do that so the validity of the memory persists throughout runtime of your program.


edit:

Incorrect. This absolutely applies to int and int*,, it applies to any object and pointer(s) to it. If you were to have the following:

int main() {
    int* head;
    *head = 2;  // head uninitialized and unassigned, this is UB
    printf("%d\n", *head); // UB again

    return 0;
}

this is every bit of undefined behavior as you have in your OP. A pointer must point to something valid before you can dereference it. In the above code, head is uninitialized, it doesn't point to anything deterministically, and as soon as you do *head (whether to read or write), you're invoking undefined behavior. Just as with your struct node, you must do something like following to be correct:

int main() {
        int myInt;  // creates space for an actual int in automatic storage (most likely the stack)
        int* head = &myInt;  // now head points to a valid memory location, namely myInt
        *head = 2;  // now myInt == 2
        printf("%d\n", *head);  // prints 2

        return 0;
    }

or you can do

int main() {
        int* head = malloc(sizeof int);  // silly to malloc a single int, but this is for illustration purposes
        if (head != NULL)
        {
            // space for an int was returned to us from the heap
            *head = 2; // now the unnamed int that head points to is 2
            printf("%d\n", *head);  // prints out 2
            // don't forget to clean up
            free(head);
        }
        else
        {
            // handle error, print error message, etc
        }

        return 0;
    }

These rules are true for any primitive type or data structure you're dealing with. Pointers must point to something, otherwise dereferencing them is undefined behavior, and you hope you get a segfault when that happens so you can track down the errors before your TA grades it or before the customer demo. Murphy's law dictates UB will always crash your code when it's being presented.

Upvotes: 2

Stephan Lechner
Stephan Lechner

Reputation: 35154

Statement struct node* head; defines a pointer to a node object, but not the node object itself. As you do not initialize the pointer (i.e. by letting it point to a node object created by, for example, a malloc-statement), dereferencing this pointer as you do with head->data yields undefined behaviour.

Two ways to overcome this, (1) either allocate memory dynamically - yielding an object with dynamic storage duration, or (2) define the object itself as an, for example, local variable with automatic storage duration:

(1) dynamic storage duration

int main() {
    struct node* head = calloc(1, sizeof(struct node));
    if (head) {
      head->data = 2;
      printf("%d \n", head->data);
      free(head);
    }
}

(2) automatic storage duration

int main() {
    struct node head;
    head.data = 2;
    printf("%d \n", head.data);
}

Upvotes: 1

Pablo
Pablo

Reputation: 13570

When you declare a variable, you define the type of the variable, then it's name and optionally you declare it's initial value.

Every type needs an specific amount of memory. For example int would be 32 bit long on a 32bit OS, 8 bit long on a 64.

A variable declared in a function is usually stored in the stack associated with the function. When the function returns, the stack for that function is no longer available and the variable does not longer exist.

When you need the value/object of the variable to exist even after a function returns, then you need to allocate memory on a different part of the program, usually the heap. That's exactly what malloc, realloc and calloc do.

Doing

struct node* head ;
head->data = 2;

is just wrong. You've declaring a pointer named head of type struct node, but you are not assigning anything to it. So it points to an unspecified location in memory. head->data = 2 tries to store a value at an unspecified location and the program will most likely crash with a segfault.

In main you could do this:

int main(void)
{
    struct node head;
    head.data = 2;
    printf("%d \n", head.data);
    return 0;
}

head will be saved in the stack and will persist as long as main doesn't return. But this is only a very small example. In a complex program where you have many more variables, objects, etc. it's a bad idea to simply declare all variables you need in main. So it's best that objects get created when they are needed.

For example you could have a function that creates the object and another one that calls create_node and uses that object.

struct node *create_node(int data)
{
    struct node *head = malloc(sizeof *head);
    if(head == NULL)
        return NULL; // no more memory left

    head->data = data;
    head->next = NULL;

    return head;
}

struct node *foo(void)
{
    struct node *head = create_node(112);

    // do somethig with head

    return head;
}

Here create_node uses malloc to allocate memory for one struct node object, initializes the object with some values and returns a pointer to that memory location. foo calls create_node and does something with it and it returns the object. If another function calls foo, this function will get the object.

There are also other reasons for malloc. Consider this code:

void foo(void)
{
    int numbers[4] = { 1, 3, 5, 7 };

    ...
}

In this case you know that you will need 4 integers. But sometimes you need an array where the number of elements is only known during runtime, for example because it depends on some user input. For this you can also use malloc.

void foo(int size)
{
    int *numbers = malloc(size * sizeof *numbers);

    // now you have "size" elements
    ...
    free(numbers);   // freeing memory
}

When you use malloc, realloc, calloc, you'll need to free the memory. If your program does not need the memory anymore, you have to use free (like in the last example. Note that for simplicity I omitted the use of free in the examples with struct head.

Upvotes: 3

Dúthomhas
Dúthomhas

Reputation: 10038

You don’t technically, but maintaining all nodes with the same memory pattern is only an advantage to you, with no real disadvantages.

Just assume that all nodes are stored in the dynamic memory.

Your “insert” procedure would be better named something like “add” or (for full functional context) “cons”, and it should return the new node:

struct node* cons(int val, struct node* next)
{
  struct node* this = (struct node*)malloc( sizeof struct node );
  if (!this) return next;  // or some other error condition!
  this->data = val;
  this->next = next;
  return this;
}

Building lists is now very easy:

int main()
{
  struct node* xs = cons( 2, cons( 3, cons( 5, cons( 7, NULL ) ) ) );
  // You now have a list of the first four prime numbers.

And it is easy to handle them.

  // Let’s print them!
  {
    struct node* p = xs;
    while (p)
    {
      printf( "%d ", p->data );
      p = p->next;
    }
    printf( "\n" );
  }

  // Let’s get the length!
  int length = 0;
  {
    struct node* p = xs;
    while (p)
    {
      length += 1;
      p = p->next;
    }
  }
  printf( "xs is %d elements long.\n", length );

By the way, you should try to be as consistent as possible when naming things. You have named the node data “data” but the constructor’s argument calls it “val”. You should pick one and stick to it.

Also, it is common to:

typedef struct node node;

Now in every place except inside the definition of struct node you can just use the word node.

Oh, and I almost forgot: Don’t forget to clean up with a proper destructor.

node* destroy( node* root )
{
  if (!root) return NULL;
  destroy( root->next );
  free( root );
  return NULL;
}

And an addendum to main():

int main()
{
  node* xs = ...

  ...

  xs = destroy( xs );
}

Upvotes: 3

Related Questions