sam_k
sam_k

Reputation: 6023

Why is file size different when the files have the same number of characters?

Here when I get file size using stat() it gives different output, why does it behave like this?

When "huffman.txt" contains a simple string like "Hi how are you" it gives file_size = 14. But when "huffman.txt" contains a string like "ά­SUä5Ñ®qøá"F" it gives file size = 30.

#include <sys/stat.h>
#include <stdio.h>

int main() 
{
    int size = 0;
    FILE* original_fileptr = fopen("huffman.txt", "rb");
    if (original_fileptr == NULL) {
        printf("ERROR: fopen fail in %s at %d\n", __FUNCTION__, __LINE__);
        return 1;
    }
    /*create variable of stat*/
    struct stat stp = { 0 };
    stat("huffman.txt", &stp);
    /*determine the size of data which is in file*/
    int filesize = stp.st_size;
    printf("\nFile size is %d\n", filesize);
}

Upvotes: 0

Views: 1425

Answers (2)

durgaps
durgaps

Reputation: 56

This has got to do with encoding.

Plain-text english characters are encoded in ASCII, where each character is one byte. However, characters in non-plain text english are encoded in Unicode each being 2-byte.

Easiest way to see what is happening is to print each character using

char c;
/* Read file. */
while (c = fgetc())
  printf ("%c", c)

You'll understand why the file size is different.

Upvotes: 4

Micah Hainline
Micah Hainline

Reputation: 14427

If you're asking why different strings with the same number of characters could have different sizes in bytes, read up on UTF-8

Upvotes: 0

Related Questions