nflacco
nflacco

Reputation: 5082

c array error- expect elements to be initialized, but they are not, and my setters appear not to work, sometimes

I'm having trouble creating a 2d array and filling it with values and then reading the array and getting totally different values. The strange thing is I have two maintain two arrays, and one of them is storing values correctly, and the other is not. I'm pretty sure I'm not overwriting elements either. I assume I'm making some stupid error that is obvious to someone who isn't horrible with C.

Please note I'm implementing the viterbi algorithm, but the general algo I understand and have a working python implementation of, it's just the arrays in c are giving me grief.

What I'm doing:

1) Malloc two arrays, they are used as 2D arrays, but I allocate a contiguous block of memory. I do not initialize the arrays explicitly as I should fill every entry in them as go through the forward step of the viterbi algo.

double *viterbi_table = malloc(sizeof(double) * n_samples * n_states);
int *best_path_table = malloc(sizeof(int) * n_samples * n_states);

2) For the forward portion of the viterbi algo, I walk through the observed data, and calculate the most likely possible states and probabilities for each state.

for (t = 1; t < n_samples; t++) // for each piece of observed data
{
    for (i = 0; i < n_states; i++)
    {
        max_state_index = 0;
        max_p = -DBL_MAX;
        // calculate the max state and probability by looping through all the states
        // yada yada...

        // IMPORTANT PART: We set the array values here
        viterbi_table[t * n_samples + i] = max_p;
        best_path_table[t * n_samples + i] = max_state_index;
        printf("\tbest_path_table[%d][%d] or [%d] = %d => %d\n", 
            i, t, t * n_samples + i, best_path_table[t * n_samples + i], max_state_index);
    }

    // IMPORTANT PART: print out rows of best path table to see if what we thought we inserted is in there
    if (debug)
    {
        printf("best_path, [ ", t);
        for (i = 0; i < n_states; i++)
        {
            printf("[%d], %d ", t * n_samples + i, best_path_table[t * n_samples + i]);
        }
        printf("]\n");
    }
}

3) I run the code, and instead of having the array elements I set match with what I thought I set them to, I get big negative or positive numbers that look like uninitialized elements. What gives? I assigned a value to those blocks. Here's a selected portion of output that shows the problem.

t=36 => sample=X
    best_path_table[0][36] or [1404] = 0 => 0
    best_path_table[1][36] or [1405] = 0 => 0
    best_path_table[2][36] or [1406] = 0 => 0
    best_path_table[3][36] or [1407] = 0 => 0
    ...
best_path, [ [1404], 1399607453 [1405], -1070347604 [1406], 1399607453 [1407], 0 ... ]

By way of the contrast, the below one is correct.

t=37 => sample=X
    best_path_table[0][37] or [1443] = 3 => 3
    best_path_table[1][37] or [1444] = 3 => 3
    best_path_table[2][37] or [1445] = 3 => 3
    ...
best_path, [ [1443], 3 [1444], 3 [1445], ... ]

When I run the code for a short piece of data, say < 12 observations, I don't have problems like this. When I run it for longer data, I have most of the best path table not filled correctly- it seems like the pattern is like so:

observation#
1) correct
2-3) garbage
4) correct
4-5) garbage
and so on

Code

See this gist. It has no dependencies to 3rd party libraries.

EDIT:

The first row of the viterbi table is initialized in a step prior to the forward part of the algorithm.

for (i = 0; i < n_states; i++)
{
    state_i = states[i];
    sample_t = samples[0];
    viterbi_table[i*n_samples]
        = prior(state_i, 0, true) + emission(sample_t, state_i, true);
}

EDIT2:

In a prior version of the code, I was doing the more standard 2D array initialization (in non-contiguous blocks) and corresponding array accesses. This gave me bus error consistently for larger pieces of input data, which totally makes sense.

double **viterbi_table = malloc(sizeof * viterbi_table * n_states);
int **best_path_table = malloc(sizeof * best_path_table * n_states);
...
viterbi_table[j][t - 1] = ...

EDIT3, Comments on solution:

It turns out this was a stupid subscript mistake. The viterbi and best path arrays are n_samples * n_states in size, that is 17 * 39 = 663. That rules out any an index of 1404 as in my example.

The specific problem is that my array indexing was a mess because I mistakenly used n_samples instead of n_states. For a given observation index t (30), and a given state index i (14), the calculation looks as follows:

// Original, wrong
t * n_samples + i = 30 * 39 + 14 = 1184

// New, correct
t * n_states + i = 30 * 17 + 14 = 524

The variable t already encodes the number of samples we are at, so we just need to multiply that by the number of states.

EDIT4, Fixed Code: Fixed code can be found here. I've also adjusted the emission and transition probabilities for my use cases.

Upvotes: 4

Views: 159

Answers (2)

Jonathan Leffler
Jonathan Leffler

Reputation: 753665

I added a macro CHECK_SUBSCRIPT to check that subscripts are in range, using assert. It fired — there's a subscript out of control.

Source

Comments removed.

#include <assert.h>
#include <float.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define CHECK_SUBSCRIPT(x, n_samples)   assert((x) >= 0 && (x) < (n_samples) * n_states)

typedef enum { false, true } bool;
typedef double (*prob_function_def)(char, char, bool);

int n_states = 17;
int n_observations = 17;
char states[17] =
    { 'X', '0', '1', '2', '3', '4', '5', '6', '7',
    '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'};
char observations[17] =
    { 'X', '0', '1', '2', '3', '4', '5', '6', '7',
    '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'};

void viterbi_log (
                char *samples,
                int n_samples,
                char *best_path,
                prob_function_def prior,
                prob_function_def transition,
                prob_function_def emission,
                bool debug
             )
{
    printf("\nviterbi...\n");

    int i, j, t, max_state_index;
    char state_i, state_j, sample_t;
    double trans_p, max_p;

    double *viterbi_table = malloc(sizeof(double) * n_samples * n_states);
    int *best_path_table = malloc(sizeof(int) * n_samples * n_states);

    for (int n33 = 0; n33 < n_samples * n_states; n33++)
    {
        CHECK_SUBSCRIPT(n33, n_samples);
        viterbi_table[n33] = 3.14159;
        best_path_table[n33] = 314159;
    }

    if (debug) printf("\nInitialization:\n");
    for (i = 0; i < n_states; i++)
    {
        state_i = states[i];
        sample_t = samples[0];
        CHECK_SUBSCRIPT(i*n_samples, n_samples);
        viterbi_table[i*n_samples]
            = prior(state_i, 0, true) + emission(sample_t, state_i, true);
        if (debug)
        {
            printf("\t");
            printf("log(prior[%c]) + log(emission[%c][%c]) = %e\n",
                state_i, sample_t, state_i, viterbi_table[i*n_samples]);
        }
    }

    if (debug) printf("\nForward:\n");
    for (t = 1; t < n_samples; t++)
    {
        sample_t = samples[t];
        if (debug) printf("t=%d => sample=%c\n", t, sample_t);
        for (i = 0; i < n_states; i++)
        {
            state_i = states[i];
            max_state_index = 0;
            max_p = -DBL_MAX;

            for (j = 0; j < n_states; j++)
            {
                state_j = states[j];
                CHECK_SUBSCRIPT(((t-1)*n_samples)+j, n_samples);
                trans_p = viterbi_table[((t - 1) * n_samples) + j]
                    + transition(state_i, state_j, true)
                    + emission(sample_t, state_j, true);
                if (trans_p > max_p)
                {
                    max_state_index = j;
                    max_p = trans_p;
                }
            }

            CHECK_SUBSCRIPT(t*n_samples+i, n_samples);
            viterbi_table[t * n_samples + i] = max_p;
            best_path_table[t * n_samples + i] = max_state_index;
            printf("\tbest_path_table[%d][%d] or [%d] = %d => %d\n",
                    i, t, t * n_samples + i, best_path_table[t * n_samples + i], max_state_index);
        }

        if (debug)
        {
            printf("best_path, [ ");
            for (i = 0; i < n_states; i++)
            {
                CHECK_SUBSCRIPT(t*n_samples+i, n_samples);
                printf("[%d], %d ", t * n_samples + i, best_path_table[t * n_samples + i]);
            }
            printf("]\n");
        }
    }

    if (debug)
    {
        printf("\nbest path table:\n");
        for (t = n_samples - 1; t > 0; t--)
        {
            printf("t=%d, [ ", t);
            for (i = 0; i < n_states; i++)
            {
                CHECK_SUBSCRIPT(t*n_samples+i, n_samples);
                printf("[%d], %d ", t * n_samples + i, best_path_table[t * n_samples + i]);
            }
            printf("]\n");
        }
    }

    free(viterbi_table);
    free(best_path_table);
}

double prior_prob (char state_i, char state_j, bool log_prob)
{
    if (!log_prob)
        return 0.25;
    else
        return -1.3862943611198906;
}

double transition_prob (char state_i, char state_j, bool log_prob)
{
    if (!log_prob)
    {
        if (state_i == 0 && state_j == 0)
            return 0.9;
        else if (state_i == 0 || state_j == 0)
            return 0.1;
        else if (state_i == state_j)
            return 0.9;
        else
            return 0.1;
    }
    else
    {
        if (state_i == 0 && state_j == 0)
            return -0.10536051565782628;
        else if (state_i == 0 || state_j == 0)
            return -2.3025850929940455;
        else if (state_i == state_j)
            return -0.10536051565782628;
        else
            return -2.3025850929940455;
    }
}

double emission_prob (char observation, char state, bool log_prob)
{
    if (!log_prob)
    {
        if (state == observation)
            return 0.8;
        else
            return 0.2;
    }
    else
    {
        if (state == observation)
            return -0.2231435513142097;
        else
            return -1.6094379124341003;
    }
}

void dtmf_example()
{
    bool debug = true;
    char *samples = "XXXXXXXXXXXXXX6X66XXXXXXXXXXXXXXXXXXXXX";
    int n_samples = strlen(samples);
    char *best_path = malloc(sizeof(int) * n_samples);

    viterbi_log(samples, n_samples, best_path,
        &prior_prob, &transition_prob, &emission_prob, debug);

    printf("\nbest path = { ");
    for (int t = 0; t < n_samples; t++)
        printf("%d ", best_path[t]);
    printf("}\n");
}

int main(void)
{
    dtmf_example();
    return 0;
}

Assertion fired:

best_path_table[13][16] or [637] = 0 => 0
best_path_table[14][16] or [638] = 0 => 0
best_path_table[15][16] or [639] = 0 => 0
best_path_table[16][16] or [640] = 0 => 0
best_path, [ [624], 0 [625], 0 [626], 0 [627], 0 [628], 0 [629], 0 [630], 0 [631], 7 [632], 0 [633], 0 [634], 0 [635], 0 [636], 0 [637], 0 [638], 0 [639], 0 [640], 0 ]
t=17 => sample=6
Assertion failed: ((t*n_samples+i) >= 0 && (t*n_samples+i) < (n_samples) * n_states), function viterbi_log, file viterbi3.c, line 89.
Abort trap: 6

That's the CHECK_SUBSCRIPT at:

        CHECK_SUBSCRIPT(t*n_samples+i, n_samples);
        viterbi_table[t * n_samples + i] = max_p;
        best_path_table[t * n_samples + i] = max_state_index;

Upvotes: 1

Drew McGowen
Drew McGowen

Reputation: 11706

It would appear that because t starts with 1, the first row is never initialized. I'm guessing that each row depends on the previous, so the junk values propagate through. The fact that one particular case worked is purely by chance (the uninitialized data happened to be valid).

Upvotes: 0

Related Questions