Mounir Cobra
Mounir Cobra

Reputation: 21

why are there 3 undesired characters in the beginning of the string in C language?

I have a problem with this C program that reads the content of a file and copies it in a string then prints it. when I allocate a string, it has always 3 strange characters in it, and I could solve that by putting '\0' at the beginning to like initialize it to an empty string, as shown in part 1 and 2. But when it comes to reading the file, even with that technique the 3 characters won't go, like shown in part 3.

Anyone knows why those 3 chars are printed, knowing that if I copy the string into another file, they don't appear in it; and why do they still appear when i read the file ?

#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#define length 20

int main() { /////////PART 1

    char *T = (char*) malloc((length+1)*sizeof(char)) ;
    printf("%s\n", T); 

    strcat(T,  "hello") ;
    printf("%s\n", T); 

////////////////////////////////////PART 2  

    char *M = (char*) malloc((length+1)*sizeof(char)) ; M[0] = '\0' ;
    printf("%s\n", M);

    strcat(M,  "hello") ;
    printf("%s\n", M); 

////////////////////////////////////PART 3 

    FILE *fil = fopen("test.txt", "r") ;
    char *S = (char*) malloc((length+1)*sizeof(char)) ; S[0] = '\0' ;


    fread(S, sizeof(char), length, fil); 
    S[length] = '\0' ;

    printf("%s\n", S) ;
    fclose(fil) ;
}

Upvotes: 0

Views: 79

Answers (1)

Serge Ballesta
Serge Ballesta

Reputation: 149075

Could the 3 characters be  or ´╗┐? It is common to prepend a Byte Order Mark at the beginning of unicode text files. The BOM is the magic value 0xfeff.

On a UTF-8 encoded files it comes as 3 bytes "\xef\xbb\xbf", on a UTF-16 Little Endian, it is the 2 bytes "\xff\xfe" and on a UTF-16 Big Endian, it is the 2 bytes "\xfe\xff".

If you are reading a file that contains a BOM, having those special characters is normal.

Upvotes: 2

Related Questions