Doritos
Doritos

Reputation: 423

Embedding any file into gcc compiled program and accessing it inside program

I am making a program that I want to read in a file and store the contents of the file into the actual program. Then the program once ran, will be able to make a new file with the one that was embedded into it.

Program.c compile with picture.jpg
Program is now the output file that has picture.jpg inside of it
Program runs and then creates a new file called pic.jpg
Program reads somewhere(?) in its own code to find the data of the picture.jpg
Program writes the data to pic.jpg
Now pic.jpg and picture.jpg are the same file.

I did some reading and found a little article that explains it: https://codeplea.com/embedding-files-in-c-programs

Towards the bottom at "Alternative - Linking the Blob in Directly"

The question I am having is, how does

extern const char binary_some_file_jpg_start[];
extern const char binary_some_file_jpg_end[];

tell me where my data is? Could those have been any random names and it still knows where to go to find the data?

Upvotes: 0

Views: 765

Answers (1)

Jonathan Leffler
Jonathan Leffler

Reputation: 754860

Having gone to read the article (which shouldn't be necessary; your question should stand on its own), it is not clear to me how the arrays are determined. The article uses

gcc -c my_program.c -o my_program.o
ld -r -b binary -o some_file.o some_file.jpg
gcc my_program.o some_file.o -o my_program

to embed the JPG file some_file.jpg into the object file (some_file.o).

The GNU ld manual page covers -b binary. Empirically, after creating a (binary) object file from image.jpg, you get some symbols defined:

$ ld -r -b binary -o image.o image.jpg
$ nm image.o
00000000000442d0 D _binary_image_jpg_end
00000000000442d0 A _binary_image_jpg_size
0000000000000000 D _binary_image_jpg_start
$

The output of the ld -r -b binary … operation is an 'object file' that has three symbols defined, and that contains one .data section and defines three symbols. The symbol names are determined by ld from the name of the binary file it is given to process.

$ objdump -h image.o

image.o:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .data         000442d0  0000000000000000  0000000000000000  00000040  2**0
                  CONTENTS, ALLOC, LOAD, DATA
$ objdump -t image.o

image.o:     file format elf64-x86-64

SYMBOL TABLE:
0000000000000000 l    d  .data  0000000000000000 .data
0000000000000000 g       .data  0000000000000000 _binary_image_jpg_start
00000000000442d0 g       .data  0000000000000000 _binary_image_jpg_end
00000000000442d0 g       *ABS*  0000000000000000 _binary_image_jpg_size

$

Note that those symbols have an underscore at the start. You can use this code to print out addresses etc. Note that it uses the t qualifier for the ptrdiff_t type — that is the type of the result of subtracting two pointers:

#include <stdio.h>

extern char _binary_image_jpg_start[];
extern char _binary_image_jpg_end[];

int main(void)
{
    printf("Image start: %p\n", _binary_image_jpg_start);
    printf("Image end:   %p\n", _binary_image_jpg_end);
    printf("Image size:  0x%tx\n", _binary_image_jpg_end - _binary_image_jpg_start);
    return 0;
}

Example output:

$ gcc -o image main.c image.o
$ ./image
Image start: 0x6008e8
Image end:   0x644bb8
Image size:  0x442d0
$

The image size calculated corresponds to the size given by nm image.o. So the code could go on to read the data from the array _binary_image_jpg_start.

This depends on code features of the GNU ld program. It may not work with any other ld — unless the program emulates GNU ld.

Demo created on antique RHEL 5 (2.6.18-128.el5 #1 SMP, dated 2008-12-17) using GCC 9.2.0 and GNU ld (GNU Binutils) 2.25.1.

Upvotes: 2

Related Questions