Reputation: 423
I am making a program that I want to read in a file and store the contents of the file into the actual program. Then the program once ran, will be able to make a new file with the one that was embedded into it.
Program.c compile with picture.jpg
Program is now the output file that has picture.jpg inside of it
Program runs and then creates a new file called pic.jpg
Program reads somewhere(?) in its own code to find the data of the picture.jpg
Program writes the data to pic.jpg
Now pic.jpg and picture.jpg are the same file.
I did some reading and found a little article that explains it: https://codeplea.com/embedding-files-in-c-programs
Towards the bottom at "Alternative - Linking the Blob in Directly"
The question I am having is, how does
extern const char binary_some_file_jpg_start[];
extern const char binary_some_file_jpg_end[];
tell me where my data is? Could those have been any random names and it still knows where to go to find the data?
Upvotes: 0
Views: 765
Reputation: 754860
Having gone to read the article (which shouldn't be necessary; your question should stand on its own), it is not clear to me how the arrays are determined. The article uses
gcc -c my_program.c -o my_program.o
ld -r -b binary -o some_file.o some_file.jpg
gcc my_program.o some_file.o -o my_program
to embed the JPG file some_file.jpg
into the object file (some_file.o
).
The GNU ld
manual page covers -b binary
. Empirically, after creating a (binary) object file from image.jpg
, you get some symbols defined:
$ ld -r -b binary -o image.o image.jpg
$ nm image.o
00000000000442d0 D _binary_image_jpg_end
00000000000442d0 A _binary_image_jpg_size
0000000000000000 D _binary_image_jpg_start
$
The output of the ld -r -b binary …
operation is an 'object file' that has three symbols defined, and that contains one .data
section and defines three symbols. The symbol names are determined by ld
from the name of the binary file it is given to process.
$ objdump -h image.o
image.o: file format elf64-x86-64
Sections:
Idx Name Size VMA LMA File off Algn
0 .data 000442d0 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, DATA
$ objdump -t image.o
image.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 g .data 0000000000000000 _binary_image_jpg_start
00000000000442d0 g .data 0000000000000000 _binary_image_jpg_end
00000000000442d0 g *ABS* 0000000000000000 _binary_image_jpg_size
$
Note that those symbols have an underscore at the start. You can use this code to print out addresses etc. Note that it uses the t
qualifier for the ptrdiff_t
type — that is the type of the result of subtracting two pointers:
#include <stdio.h>
extern char _binary_image_jpg_start[];
extern char _binary_image_jpg_end[];
int main(void)
{
printf("Image start: %p\n", _binary_image_jpg_start);
printf("Image end: %p\n", _binary_image_jpg_end);
printf("Image size: 0x%tx\n", _binary_image_jpg_end - _binary_image_jpg_start);
return 0;
}
Example output:
$ gcc -o image main.c image.o
$ ./image
Image start: 0x6008e8
Image end: 0x644bb8
Image size: 0x442d0
$
The image size calculated corresponds to the size given by nm image.o
. So the code could go on to read the data from the array _binary_image_jpg_start
.
This depends on code features of the GNU ld
program. It may not work with any other ld
— unless the program emulates GNU ld
.
Demo created on antique RHEL 5 (2.6.18-128.el5 #1 SMP, dated 2008-12-17) using GCC 9.2.0 and GNU ld (GNU Binutils) 2.25.1.
Upvotes: 2