zeme
zeme

Reputation: 437

Inconsistent behaviour when invoking same function from different places

As part of a team assignment I have to create a program that reads an assembly source file and produces binary code for a specific architecture.

I created the function tokenize to split a string into tokens based on a pattern provided.

The problem I encountered is that when invoking toks_print() from main() the last two lines (tokens) are illegible whereas when toks_print() is called from read_assembly_file() the result is consistent.

This is the output as printed onto stdout:

This is the file read:
ldr r0,=0x20200004
ldr r2,[r0]
cmp r2,r0
andeq r0,r0,r0

Printing 4 tokens:
ldr r0,=0x20200004
ldr r2,[r0]
cmp r2,r0
andeq r0,r0,r0

Printing 4 tokens:
ldr r0,=0x20200004
ldr r2,[r0]
c¿
\370\277_\377¿


My question is: why is this happening? I'm sure it has something to do with pointers but for the life of me I can't figure it out. This also happens with any other file I tried: the last lines are missing or indecipherable.

For completeness this is the content of gpio_0.s:
ldr r0,=0x20200004
ldr r2,[r0]
cmp r2,r0
andeq r0,r0,r0

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

//////////////////////////////////////////////////////////////////////

typedef struct Tokens
{
    char **toks;
    unsigned int tokno;
} Tokens;


Tokens *toks_new() 
{   
    Tokens *tokens = malloc(sizeof(Tokens));
    tokens->toks   = malloc(sizeof(char **));
    return tokens;
}


void toks_free(Tokens *tokens)
{
    free(tokens);
    free(tokens->toks);
}


void toks_print(Tokens *tokens)
{
    printf("Printing %i tokens:\n", tokens->tokno);
    for (int i = 0; i < tokens->tokno; i++) 
    {
        printf("%s\n", tokens->toks[i]);
    }
    printf("\n\n");
}


Tokens *tokenize(char *str, const char *delim)
{
    Tokens *tokens = toks_new();
    for (int n = 0; ; n++)
    {
        if (n != 0) str = NULL;
        char *token = strtok(str, delim);
        if (token == NULL)
        {
            tokens->tokno = n;
            break;
        } 
        tokens->toks[n] = token;
    }
    return tokens;
}

////////////////////////////////////////////////////////////////////////////////

Tokens *program = NULL;

////////////////////////////////////////////////////////////////////////////////

void read_assembly_program(const char *filepath)
{
    FILE *file = fopen(filepath, "rt");

    fseek(file, 0, SEEK_END);
    long bytes = ftell(file);
    rewind(file);

    char buffer[bytes];
    fread(buffer, 1, bytes, file);

    // Without this I get an indecipherable line at the end... But why?
    buffer[bytes-1] = '\0';

    // What is printed is exactly what I expect, the whole content of the file
    printf("This is the file read:\n%s\n\n\n", buffer);

    program = tokenize(buffer, "\n");
    // This prints the tokens as expected
    toks_print(program);
}

////////////////////////////////////////////////////////////////////////////////

int main(int argc, char **argv) 
{
    const char * file = "gpio_0.s";

    read_assembly_program(file);

    // But here the last two lines messed up!
    toks_print(program);

  return EXIT_SUCCESS;
}

Upvotes: 0

Views: 80

Answers (1)

alk
alk

Reputation: 70893

tokenize() stores references to buffer into the data referred by program.

buffer is declared local to read_assembly_program() an so the memory isn't valid anymore after read_assembly_program() had been left.

To get around this pass down to read_assembly_program() a reference to the buffer or allocate buffer on the heap using malloc().


Update

Two (not so nice) alternative solutions:

  • Define the buffer globally.
  • Declare the "local" buffer as static.

Upvotes: 2

Related Questions