zeropoint
zeropoint

Reputation: 245

Allocations for #define and const

Say I am having a macro

#define MSG "Input your first name"

and a const

const char* const msg = "Input your last name" or const std::string msg = "Input your last name"

in the same program.

Now, msg string literal will have a memory location which will be referred to by every reference of msg in the program.

But does the same apply to MSG, i.e., does every occurrence of MSG refer to same string literal or actually different string literals are created for each occurrence?

My guess is that since macros are handled by Preprocessor, duplicate string literals might be created (not 100% sure). Is that true? I am sure that duplicacy won't matter if it's integral type.

My question is specific to storage in memory, but other aspects are also welcome.

In other words, say I am using msg 100 times, but memory utilized is constant, but is memory utilization constant or 100 times if MSG is used 100 times?

Upvotes: 1

Views: 240

Answers (6)

Jim Balter
Jim Balter

Reputation: 16406

But does the same apply to MSG, i.e., does every occurrence of MSG refer to same string literal 

The question is meaningless because MSG does not "refer to" anything. The preprocessor simply does token replacement ... where you type MSG it's just as if you had typed "Input your first name" instead. So what memory is used depends on where you type it; e.g.,

char* a = MSG;
char* b = MSG;
char* c = "Input your first name";

produces one copy of the string (in a typical implementation that uses a string pool, but the standard doesn't require it), but

char a[] = MSG;
char b[] = MSG;
char c[] = "Input your first name";

produces three copies of the string. (Although, depending on exactly how you use them, the compiler might optimize them into one or two copies, or even no copies.)

Additionally, consider

char* twice = MSG MSG;

which allocates one string containing two copies of MSG. I think this shows most clearly that the notion that MSG "refers to" something is a misconception ... your question conflates two quite different issues, macro expansion and string spooling.

Upvotes: 1

Angel.King.47
Angel.King.47

Reputation: 7994

g++ in linux, refers to the same location for MSG or a const char * if the content is the same

Given:

#include <stdio.h>

#define MSG "Input your last name"

int main()
{
    const char* const msgc = "Input your last name";

    printf("MACRO %p\n", &MSG);
    printf("char %p\n", msgc);
    printf("MACRO %p\n", &MSG);
}

The disassembly of the above

(gdb) disassemble main
Dump of assembler code for function main():
   0x000000000040070c <+0>:     push   rbp
   0x000000000040070d <+1>:     mov    rbp,rsp
   0x0000000000400710 <+4>:     sub    rsp,0x10
   0x0000000000400714 <+8>:     mov    QWORD PTR [rbp-0x8],0x400864
   0x000000000040071c <+16>:    mov    esi,0x400864
   0x0000000000400721 <+21>:    mov    edi,0x400879
   0x0000000000400726 <+26>:    mov    eax,0x0
   0x000000000040072b <+31>:    call   0x4005c0 <printf@plt>
   0x0000000000400730 <+36>:    mov    esi,0x400864
   0x0000000000400735 <+41>:    mov    edi,0x400883
   0x000000000040073a <+46>:    mov    eax,0x0
   0x000000000040073f <+51>:    call   0x4005c0 <printf@plt>
   0x0000000000400744 <+56>:    mov    esi,0x400864
   0x0000000000400749 <+61>:    mov    edi,0x400879
   0x000000000040074e <+66>:    mov    eax,0x0
   0x0000000000400753 <+71>:    call   0x4005c0 <printf@plt>
   0x0000000000400758 <+76>:    mov    eax,0x0
   0x000000000040075d <+81>:    leave  
   0x000000000040075e <+82>:    ret    
End of assembler dump.

0x400864 in this case is "Input your last name" and msgc and MSG point to the same location.

Upvotes: 0

Each place in code where you use the macro MSG will contain the literal "Input your first name" after preprocessing. However, whether this text will be present in the binary several times or just once depends entirely on your compiler. Quoting [lex.string]§12:

Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation-defined. The effect of attempting to modify a string literal is undefined.

In other words, the compiler (and/or linker) is free to put the text data into the binary image just once, and have all appearnces of the literal in code refer to the same data.

Upvotes: 1

paulm
paulm

Reputation: 5882

If the string is repeated 100 times in the binary then the size of the binary in memory will be greater - but it won't affect the amount of used heap.

As for if the string will be repeated 100 times using a #define? Yes it certainly will, if you view the pre-processor output of your source you will see this. However some compilers may then remove the duplicates in a later step (linking I would assume). This feature is called string pooling, MSVC reference is here:

http://msdn.microsoft.com/en-us/library/s0s0asdt(v=vs.110).aspx

Upvotes: 4

Tony The Lion
Tony The Lion

Reputation: 63190

A macro gets replace by its actual content every place it occurs by the preprocessor. So by the time the compiler gets to your code, your MSG will have been replace with the actual string every time it occurred, meaning that this string will be hardcoded in your code base.

What the compiler then does with multiple occurrances of the same string, is dependent on compiler settings etc, but probably will store it once and then refer to it wherever it occurs.

Upvotes: 3

user529758
user529758

Reputation:

My guess is that since macros are handled by Preprocessor, duplicate string literals might be created

They might, but in practice, almost every modern compiler will merge identical string literals into one, so that every different instance of "foo" will indeed have the same memory address. But however often this may be done by optimizing compilers: don't rely on it.

Upvotes: 0

Related Questions