Juan Torres
Juan Torres

Reputation: 744

Replacing some characters in a string with characters stored in an array inserts garbage

I have been trying to replace the characters in a string for characters stored in some arrays:

char encode_table[122];
char decode_table[122];

...

int main()
{
    memset(encode_table, 0, 122);
    memset(decode_table, 0, 122);
    ...

To populate the table, I use a file in the format

a b
c d

where a maps to b, c maps to d, etc. I store the mappings in the array, using as indices the ASCII values of the mapped characters.

encode_table[97] // Asking for the mapping of 'a'. Returns 'b'

After I map all the characters, I parse a file, line by line. Each line is processed by another function that is supposed to replace the characters that must be replaced and leaves alone the rest.

void display(char * filename){
    char buffer[255];
    FILE * file = fopen(filename, "r");
    ...
    while(fgets(buffer, sizeof(buffer), file){
        display_line(buffer);
    }
}

void display_line(char * line){
    char c;
    char c_r;
    char format_str[255];

    if(encode || decode){
        for(int i = 0; i < strlen(line); i++){
            c = line[i];
            c_r = (encode ? encode_table[(int)c] : decode_table[(int)c]);

            if((int)c != O){  // don't print empty chars in the buffer
                if(c == EOF){
                    break;
                }
                if((int)c_r != 0){
                    format_str[strlen(format_str)] = c_r;
                }
                else{
                    format_str[strlen(format_str)] = c;
                }
            }
        }
        printf("%s", format_str);
        memset(format_str, 0, strlen(format_str)); // reset char array for next iteration
}

As far as I can tell, the encode_table and decode_table are built properly (in this example, I'm only mapping English-alphabet characters to other English-alphabet characters. The mapping is 1-to-1):

encode_table:

{0: , 1: , 2: , 3: , 4: , 5: , 6: , 7: , 8: , 9: , 10: , 11: , 12: , 13: , 14: , 15: , 16: , 17: , 18: , 19: , 20: , 21: , 22: , 23: , 24: , 25: , 26: , 27: , 28: , 29: , 30: , 31: , 32: , 33: , 34: , 35: , 36: , 37: , 38: , 39: , 40: , 41: , 42: , 43: , 44: , 45: , 46: , 47: , 48: , 49: , 50: , 51: , 52: , 53: , 54: , 55: , 56: , 57: , 58: , 59: , 60: , 61: , 62: , 63: , 64: , 65: Z, 66: Y, 67: X, 68: W, 69: V, 70: U, 71: T, 72: S, 73: R, 74: Q, 75: P, 76: O, 77: N, 78: M, 79: L, 80: K, 81: J, 82: I, 83: H, 84: G, 85: F, 86: E, 87: D, 88: C, 89: B, 90: A, 91: , 92: , 93: , 94: , 95: , 96: , 97: z, 98: y, 99: x, 100: w, 101: v, 102: u, 103: t, 104: s, 105: r, 106: q, 107: p, 108: o, 109: n, 110: m, 111: l, 112: k, 113: j, 114: i, 115: h, 116: g, 117: f, 118: e, 119: d, 120: c, 121: b, 122: a, }

decode_table:

{0: , 1: , 2: , 3: , 4: , 5: , 6: , 7: , 8: , 9: , 10: , 11: , 12: , 13: , 14: , 15: , 16: , 17: , 18: , 19: , 20: , 21: , 22: , 23: , 24: , 25: , 26: , 27: , 28: , 29: , 30: , 31: , 32: , 33: , 34: , 35: , 36: , 37: , 38: , 39: , 40: , 41: , 42: , 43: , 44: , 45: , 46: , 47: , 48: , 49: , 50: , 51: , 52: , 53: , 54: , 55: , 56: , 57: , 58: , 59: , 60: , 61: , 62: , 63: , 64: , 65: Z, 66: Y, 67: X, 68: W, 69: V, 70: U, 71: T, 72: S, 73: R, 74: Q, 75: P, 76: O, 77: N, 78: M, 79: L, 80: K, 81: J, 82: I, 83: H, 84: G, 85: F, 86: E, 87: D, 88: C, 89: B, 90: A, 91: , 92: , 93: , 94: , 95: , 96: , 97: z, 98: y, 99: x, 100: w, 101: v, 102: u, 103: t, 104: s, 105: r, 106: q, 107: p, 108: o, 109: n, 110: m, 111: l, 112: k, 113: j, 114: i, 115: h, 116: g, 117: f, 118: e, 119: d, 120: c, 121: b, 122: a, }

When I try running the program on a text file, (most of) the characters seem to map to their correct mapping, but there is also a lot of garbage, especially in between characters in the original text file

Original:

For some reason, this program will not work.

Program output:

Uli hln?I?Uv iv4M1zhlm?I?U, gs4M1rh k?I?Uilti??%1zn droo mlg dlip.

Most characters seem to be correctly mapped ('Uli hln' is 'For som' in the original file, but then comes a bunch of garbage (?I?U), and then continues the mapping (v is e in the original), and so forth.

I've been staring at this for a couple of hours. Any ideas?

Upvotes: 4

Views: 101

Answers (3)

Roney Michael
Roney Michael

Reputation: 3994

strlen(format_str) yielding inconsistent values ought to be your issue; memset it before you start using it or use i instead.

Also, your encode/decode arrays are of size 122, but contain 123 elements as per your data; this is a buffer overflow which might cause memory stomping leading to unexpected behavior.

Upvotes: 2

John Hammond
John Hammond

Reputation: 477

First problem is that your format_str is not null-terminated. The loop goes from 0 to 7 for a string containing 8 characters. So your very first format_str[strlen(format_str)] is already putting data into random memory, maybe even your encode table.

Upvotes: 1

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726779

Your code relies on strlen(format_str) to be set to all zeros initially. However, there is no memset for that. This is how the random "garbage" characters end up among the characters of the "good" output.

Although memset(format_str, 0, sizeof(format_str)) will fix this problem, adding a pointer or an index to which you are writing would be even better:

int j = 0;
...
format_str[j++] = c_r;
...
// After the loop is over, null-terminate the string:
format_str[j++] = '\0';

Upvotes: 2

Related Questions