Reputation: 854
I have a linked list of structures. Lets say I insert x million nodes into the linked list, then I iterate trough all nodes to find a given value.
The strange thing is (for me at least), if I have a structure like this:
struct node
{
int a;
node *nxt;
};
Then I can iterate trough the list and check the value of a ten times faster compared to when I have another member in the struct, like this:
struct node_complex
{
int a;
string b;
node_complex *nxt;
};
I also tried it with C style strings (char array), the result was the same: just because I had another member (string), the whole iteration (+ value check) was 10 times slower, even if I did not even touched that member ever! Now, I do not know how the internals of structures work, but it looks like a high price to pay...
What is the catch?
Edit: I am a beginner and this is the first time I use pointers, so chances are, the mistake is on my part. I will post the code ASAP (not being at home now).
Update: I checked the values again, and I know see a much smaller difference: 2x instead of 10x. It is much more reasonable for sure.
While it is certainly possible it was the case yesterday too and I was just so freaking tired last night I could not divide two numbers, I have just made more tests and the results are mind blowing.
The times for a the same number of nodes is:
Look what happens when there is more than two strings in the structure! It gets faster! Did somebody drop LSD into my coffee? No! I do not drink coffee.
It is way too fckd up for my brain at the mo' so I think I will just figure it out on my own instead of draining public resources here at SO.
(Ad: I do not think my profiling class is buggy, and anyway I can see the time difference with my own eyes).
Anyhow, thanks for the help. Cheers.
Upvotes: 4
Views: 810
Reputation: 247909
Most likely, the issue is that your larger struct no longer fits inside a single cache line.
As I recall, mainstream CPUs typically use a cache line of 32 bytes. This means that data is read into the cache in chunks of 32 bytes at a time, and if you move past these 32 bytes, a second memory fetch is required.
Looking at your struct, it starts with an int
, accounting for 4 bytes (usually), and then std::string
(I assume, even though the namespace isn't specified), which in my standard library implementation (from VS2010) takes up 28 bytes, which gives us 32 bytes total. Which means that the initial int
and the the next
pointer will be placed in different cache lines, using twice as much cache space, and requiring twice as many memory accesses if both members are accessed during iteration.
If only the pointer is accessed, this shouldn't make a difference, though, as only the second cache line then has to be retrieved from memory.
If you always access the int
and the pointer, and the string is required less often, reordering the members may help:
struct node_complex
{
int a;
node_complex *nxt;
string b;
};
In this case, the next
pointer and the int
are located next to each others, on the same cache line, so they can be read without requiring additional memory reads. But then you incur the additional cost once you need to read the string
.
Of course, it's also possible that your benchmarking code includes creation of the nodes, or (intentional or otherwise) copies being created of the nodes, which would obviously also affect performance.
Upvotes: 3
Reputation: 12933
This may also be caused because during the iteration you may create a copy of your structures. That is:
node* pHead;
// ...
for (node* p = pHead; p; p = p->nxt)
{
node myNode = *p; // here you create a copy!
// ...
}
Copying a simple structure very fast. But the member you've added is a string
, which is a complex object. Copying it is a relatively complex operation, with heap access.
Upvotes: 5
Reputation: 37427
I must be related to memory access. You speak of a million linked elements. With just an int and a pointer in the node, it takes 8 bytes (assuming 32 bits pointers). This takes up 8 MB memory, which is around the size of cache memory sizes.
When you add other members, you increase the overall size of your data. It does not fit anymore entirely in the cache memory. You revert to plain memory accesses that are much slower.
Upvotes: 7
Reputation: 69672
I'm not a spacialist at all, but the "cache miss" problem rings in my head while reading your problem.
When you had a member, as it makes the size of the structure get bigger, it also might cache misses when going throught the linked list (that is naturally cache-unfriendly if you don't have nodes allocated in one bloc and not far from each other in memory).
I can't find another explaination.
However, we don't have the creation and the loop provided so it's still hard to guess if you're not just having code that don't perform the list exploration in an efficient way.
Upvotes: 1
Reputation: 6233
Perhaps a solution would be a linked list of pointers to your object. It may make things more complicated (unless you use smart pointers, ect.) but it may increase search time.
Upvotes: 0