bholanath
bholanath

Reputation: 1753

How much DATA section memory is loaded along with Code section in C

I have created a shared library where I have static const data section and code section of 4 functions.

Here are the details of my static const data section,

static const u32 T0[256] = { 256 NON_ZERO value };
static const u32 T1[256] = {256  NON_ZERO value };
static const u32 T2[256] = {256  NON_ZERO value };
static const u32 T3[256] = {256  NON_ZERO value };
static const u32 T4[256] = {256  NON_ZERO value };
static const u32 T5[256] = {256  NON_ZERO value };
static const u32 T6[256] = {256  NON_ZERO value };
static const u32 T7[256] = {256  NON_ZERO value };
static const u32 T8[256] = {256  NON_ZERO value };
static const u32 T9[10] = {10 NON_ZERO value };

Different functions defined just below the static const section

int A();
int B();

// Access different index of T0 - T3 table 
void C();

void D();

As per my understanding the text/code section will contain the executable instruction whereas data section will contain initialized static data ( for simplicity in my case all are static const)

Inside C() function , different index of T0,T1,T2 and T3 are accessed in random order.

Intentionally, inside C(), I have not accessed T0[0].

However, every time I call C() function, it is loading T0[0] irrespective of whether T0[0] is accessed or not inside C() function.

My question is how much adjacent memory of data section is loaded along with code section?

I was thinking may be whole 4KB page loaded, so everytime C() function called, whole 4KB page loaded , therefore T0[0] is also loaded along with it.

but, the experimental result shows THIS CONCEPT IS NOT TRUE/CORRECT.

Let me explain it elaborately as below.

I have calculated the distance between function C() and different static const data as below

Base address of C - Base address of T0[0] = 3221 Bytes
Base address of C - Base address of T1[0] = 4345 Bytes
Base address of C - Base address of T2[0] = 5369 Bytes
Base address of C - Base address of T3[0] = 6393 Bytes

So, whenever C() is invoked , ONLY 64Bytes (i.e. T0[0] ) is loaded . T0[1], T0[2],... the part of T0 array which also belong to same 4KB page with C() NOT LOADED(if whole 4KB page is loaded then these must be loaded, but experiment result shows they weren't loaded). Hence my concept whole 4KB page memory is loaded is wrong.

EDIT 1: based on comments of @nemetroid, C() and T0[0] may belong to different page. That's why I am adding the base address of both C() and T0[0] here.

Base address of T0[0]=0xB7758D40 ,
Base address of C=0xB7758047 when T0[0] is loaded.

In other experiment, when I add another static const of 64Byte ( like static const int DATA=10; ) before static const u32 T0[256] = {.....}

these distances become

Base address of C - Base address of T0[0] =3385 Bytes [ =64 + Base address of C - Base address of T0[0]]
Base address of C - Base address of T1[0] = 4345+64 Bytes =4409 Bytes [=64 + Base address of C - Base address of T0[0]+1024]
Base address of C - Base address of T2[0] = 5369+64 Bytes = 5433 Bytes[=64 + Base address of C - Base address of T0[0]+2*1024]
Base address of C - Base address of T3[0] = 6393 +64 Bytes = 6457 Bytes[=64 + Base address of C - Base address of T0[0]+3*1024]

EDIT1 :

Base address of T0[0]=0xB775cD80 (just shifted by 64Bytes),
Base address of C=0xB775C047 ( in this case T0[0] is not loaded)

Now, although T0[0] still present in same 4KB page with C()(ONLY shifted by 64bytes), it is not loaded whenever C() is invoked. So here again I can't say along with C(), whole 4KB page was loaded.

Can you help me to explain/understand WHY T0[0] is always accessed/loaded whenever C() is invoked although it is not accessed/used inside C() ?

or any link to understand the memory layout of program and size of memory loaded during execution of program.

I am using Debian OS and gcc compiler.

Note : To calculate whether T0[0] is loaded or not , before invoking C(), I just flushed T0[0] using clflush() instruction and then after C() is invoked I have calculated access time using rdtsc().

EDIT 1: I am using Intel Core i3 machine.

Size of L1=32KB, 8 way associative, cache line size=64bytes
Size of L2=256KB, 8 way associative, cache line size=64bytes
Size of L3=3MB, 12 way associative, cache line size=64bytes

Upvotes: 0

Views: 157

Answers (2)

JSF
JSF

Reputation: 5321

The whole cache line is loaded when any part of it is loaded. You seem to be describing a situation in which the beginning of the data is in the same cache line as part of the code. I'm surprised it would be linked/loaded that way, but it should be easy to confirm (or contradict) with a debugger. The cache line size and other details of this behavior will vary by CPU model.

Looking at the base address of C() was easier but less informative. What matters is the address of the highest instruction of C() fetched (not necessarily executed). I expect that is in the same cache line as the start of data.

Upvotes: 0

nemetroid
nemetroid

Reputation: 2159

Pages are aligned to the page size. So, if you have a page size of 4 kB, 0xAABBC000 - 0xAABBCFFF belong to the same page, as do 0xAABBD000 - 0xAABBDFFF, etc.

So if C has address 0xAABBCF00 and T0 has address 0xAABBD000 the difference between their addresses is less than 4 kB, but they still belong to different pages. However, T0[0] and T0[1] in extreme likelihood belong to the same page. Try running objdump -h a.out.

Upvotes: 1

Related Questions