Reputation: 1023
I've been having a problem with allocating memory for one of my data structures. It always crashes out, but it's not always at the same place. My suspicion is that I'm trying to allocate it over the top of something that's already there, but I'm really not sure how to tell what's actually going on or how to fix it - I've tried to install valgrind, but that doesn't yet support Mac OS 10.10.
This is the code that calls the function.
stet::file f1;
f1.set_path("test/longfile1.txt"); // a file with almost 2 million lines
f1.read();
std::string all_text = f1.get_contents();
std::vector<chunk *> chunks = populate_chunks(all_text);
These are my data structures - the idea is that the text from file is split into fixed sized chunks, which are populated up to 75% capacity, but I can't seem to create all the chunks.
struct line {
std::string text;
};
struct chunk {
line *lines[MAX_CHUNK_SIZE];
};
And this is the cause of my nightmares - it crashes out on the line below all the comments.
std::vector<chunk *> populate_chunks(std::string &text) {
std::vector<std::string> all_lines;
boost::split(all_lines, text, boost::is_any_of("\n"));
size_t num_lines = all_lines.size();
std::vector<chunk *> chunks = std::vector<chunk *>( (num_lines / START_CHUNK_SIZE) * 2 );
size_t next_line_num;
for(size_t line_num = 0; line_num < num_lines; line_num = next_line_num) {
next_line_num = line_num + START_CHUNK_SIZE;
std::cout << line_num << std::endl;
chunk *c = new chunk;
chunks.push_back(c);
// This always falls over, but not always at the same point in the file.
// Never seems to be the first time. Observed range: 3072 - 59904
// Error always looks something like this:
// text(71184,0x7fff77699300) malloc: *** error for object 0x7ff389006208: incorrect checksum for freed object - object was probably modified after being freed.
// *** set a breakpoint in malloc_error_break to debug
for(size_t i = 0; i < next_line_num; ++i) {
line *l = new line;
l->text = all_lines[line_num+i];
c->lines[i] = l;
}
}
return chunks;
}
If anyone has any ideas, they'd be much appreciated - it should be noted that I'm pretty new to C++, so it's quite likely that I've missed something really stupid.
Update:
I've fiddled around with the code to change things based on the comments I've been getting:
Macro values:
#define MAX_CHUNK_SIZE 1024
#define START_CHUNK_SIZE MAX_CHUNK_SIZE * 0.75
Valgrind output after the above changes:
==24468== Memcheck, a memory error detector
==24468== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==24468== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==24468== Command: bin/text
==24468==
==24468== Invalid write of size 8
==24468== at 0x402907: populate_chunks(std::string&) (text_storage.cc:125)
==24468== by 0x402ADF: main (text_storage.cc:173)
==24468== Address 0x216b5640 is 0 bytes after a block of size 8,192 alloc'd
==24468== at 0x4C27965: operator new(unsigned long) (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==24468== by 0x402888: populate_chunks(std::string&) (text_storage.cc:113)
==24468== by 0x402ADF: main (text_storage.cc:173)
==24468==
==24468==
==24468== Process terminating with default action of signal 11 (SIGSEGV)
==24468== Access not within mapped region at address 0x37D77000
==24468== at 0x402907: populate_chunks(std::string&) (text_storage.cc:125)
==24468== by 0x402ADF: main (text_storage.cc:173)
==24468== If you believe this happened as a result of a stack
==24468== overflow in your program's main thread (unlikely but
==24468== possible), you can try to increase the size of the
==24468== main thread stack using the --main-stacksize= flag.
==24468== The main thread stack size used in this run was 8388608.
==24468==
==24468== HEAP SUMMARY:
==24468== in use at exit: 371,641,698 bytes in 6,241,143 blocks
==24468== total heap usage: 6,241,190 allocs, 47 frees, 656,880,685 bytes allocated
==24468==
==24468== 16 bytes in 2 blocks are possibly lost in loss record 1 of 11
==24468== at 0x4C27965: operator new(unsigned long) (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==24468== by 0x4028BC: populate_chunks(std::string&) (text_storage.cc:123)
==24468== by 0x402ADF: main (text_storage.cc:173)
==24468==
==24468== 43 bytes in 1 blocks are possibly lost in loss record 2 of 11
==24468== at 0x4C27965: operator new(unsigned long) (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==24468== by 0x5340048: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib64/libstdc++.so.6.0.19)
==24468== by 0x5341900: char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator<char> const&, std::forward_iterator_tag) (in /usr/lib64/libstdc++.so.6.0.19)
==24468== by 0x5341D37: std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) (in /usr/lib64/libstdc++.so.6.0.19)
==24468== by 0x4029F7: main (text_storage.cc:138)
==24468==
==24468== 35,727,800 (33,173,592 direct, 2,554,208 indirect) bytes in 4,146,699 blocks are definitely lost in loss record 8 of 11
==24468== at 0x4C27965: operator new(unsigned long) (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==24468== by 0x4028BC: populate_chunks(std::string&) (text_storage.cc:123)
==24468== by 0x402ADF: main (text_storage.cc:173)
==24468==
==24468== 93,350,023 bytes in 1 blocks are possibly lost in loss record 9 of 11
==24468== at 0x4C27965: operator new(unsigned long) (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==24468== by 0x5340048: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib64/libstdc++.so.6.0.19)
==24468== by 0x5340235: std::string::_M_mutate(unsigned long, unsigned long, unsigned long) (in /usr/lib64/libstdc++.so.6.0.19)
==24468== by 0x53403C5: std::string::_M_leak_hard() (in /usr/lib64/libstdc++.so.6.0.19)
==24468== by 0x5340412: std::string::begin() (in /usr/lib64/libstdc++.so.6.0.19)
==24468== by 0x407728: boost::range_iterator<std::string>::type boost::range_detail::range_begin<std::string>(std::string&) (begin.hpp:49)
==24468== by 0x40705D: boost::range_iterator<std::string>::type boost::range_adl_barrier::begin<std::string>(std::string&) (begin.hpp:108)
==24468== by 0x4066FC: __gnu_cxx::__normal_iterator<char*, std::string> boost::iterator_range_detail::iterator_range_impl<__gnu_cxx::__normal_iterator<char*, std::string> >::adl_begin<std::string>(std::string&) (iterator_range_core.hpp:58)
==24468== by 0x40601A: boost::iterator_range<__gnu_cxx::__normal_iterator<char*, std::string> >::iterator_range<std::string>(std::string&, boost::iterator_range_detail::range_tag) (iterator_range_core.hpp:207)
==24468== by 0x40561F: boost::iterator_range<boost::range_iterator<std::string>::type> boost::make_iterator_range<std::string>(std::string&) (iterator_range_core.hpp:559)
==24468== by 0x404BC3: boost::iterator_range<boost::range_iterator<std::string>::type> boost::range_detail::make_range<std::string>(std::string&, long) (as_literal.hpp:93)
==24468== by 0x4040E5: boost::iterator_range<boost::range_iterator<std::string>::type> boost::as_literal<std::string>(std::string&) (as_literal.hpp:102)
==24468==
==24468== 93,351,904 bytes in 1 blocks are possibly lost in loss record 10 of 11
==24468== at 0x4C27965: operator new(unsigned long) (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==24468== by 0x5340048: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib64/libstdc++.so.6.0.19)
==24468== by 0x5341710: char* std::string::_S_construct<char*>(char*, char*, std::allocator<char> const&, std::forward_iterator_tag) (in /usr/lib64/libstdc++.so.6.0.19)
==24468== by 0x531F9A7: std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::str() const (in /usr/lib64/libstdc++.so.6.0.19)
==24468== by 0x402523: stet::file::read() (file.cc:50)
==24468== by 0x402A2E: main (text_storage.cc:139)
==24468==
==24468== 129,441,960 bytes in 1,520,226 blocks are possibly lost in loss record 11 of 11
==24468== at 0x4C27965: operator new(unsigned long) (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==24468== by 0x5340048: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib64/libstdc++.so.6.0.19)
==24468== by 0x40845E: char* std::string::_S_construct<__gnu_cxx::__normal_iterator<char*, std::string> >(__gnu_cxx::__normal_iterator<char*, std::string>, __gnu_cxx::__normal_iterator<char*, std::string>, std::allocator<char> const&, std::forward_iterator_tag) (basic_string.tcc:138)
==24468== by 0x4082E8: char* std::string::_S_construct_aux<__gnu_cxx::__normal_iterator<char*, std::string> >(__gnu_cxx::__normal_iterator<char*, std::string>, __gnu_cxx::__normal_iterator<char*, std::string>, std::allocator<char> const&, std::__false_type) (basic_string.h:1725)
==24468== by 0x408177: char* std::string::_S_construct<__gnu_cxx::__normal_iterator<char*, std::string> >(__gnu_cxx::__normal_iterator<char*, std::string>, __gnu_cxx::__normal_iterator<char*, std::string>, std::allocator<char> const&) (basic_string.h:1746)
==24468== by 0x407FDA: std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string<__gnu_cxx::__normal_iterator<char*, std::string> >(__gnu_cxx::__normal_iterator<char*, std::string>, __gnu_cxx::__normal_iterator<char*, std::string>, std::allocator<char> const&) (basic_string.tcc:229)
==24468== by 0x407D6A: std::string boost::copy_range<std::string, boost::iterator_range<__gnu_cxx::__normal_iterator<char*, std::string> > >(boost::iterator_range<__gnu_cxx::__normal_iterator<char*, std::string> > const&) (iterator_range_core.hpp:643)
==24468== by 0x407AEA: boost::algorithm::detail::copy_iterator_rangeF<std::string, __gnu_cxx::__normal_iterator<char*, std::string> >::operator()(boost::iterator_range<__gnu_cxx::__normal_iterator<char*, std::string> > const&) const (util.hpp:97)
==24468== by 0x407395: boost::transform_iterator<boost::algorithm::detail::copy_iterator_rangeF<std::string, __gnu_cxx::__normal_iterator<char*, std::string> >, boost::algorithm::split_iterator<__gnu_cxx::__normal_iterator<char*, std::string> >, boost::use_default, boost::use_default>::dereference() const (transform_iterator.hpp:121)
==24468== by 0x406B72: boost::transform_iterator<boost::algorithm::detail::copy_iterator_rangeF<std::string, __gnu_cxx::__normal_iterator<char*, std::string> >, boost::algorithm::split_iterator<__gnu_cxx::__normal_iterator<char*, std::string> >, boost::use_default, boost::use_default>::reference boost::iterator_core_access::dereference<boost::transform_iterator<boost::algorithm::detail::copy_iterator_rangeF<std::string, __gnu_cxx::__normal_iterator<char*, std::string> >, boost::algorithm::split_iterator<__gnu_cxx::__normal_iterator<char*, std::string> >, boost::use_default, boost::use_default> >(boost::transform_iterator<boost::algorithm::detail::copy_iterator_rangeF<std::string, __gnu_cxx::__normal_iterator<char*, std::string> >, boost::algorithm::split_iterator<__gnu_cxx::__normal_iterator<char*, std::string> >, boost::use_default, boost::use_default> const&) (iterator_facade.hpp:514)
==24468== by 0x40633D: boost::iterator_facade<boost::transform_iterator<boost::algorithm::detail::copy_iterator_rangeF<std::string, __gnu_cxx::__normal_iterator<char*, std::string> >, boost::algorithm::split_iterator<__gnu_cxx::__normal_iterator<char*, std::string> >, boost::use_default, boost::use_default>, std::string, boost::forward_traversal_tag, std::string, long>::operator*() const (iterator_facade.hpp:639)
==24468== by 0x405895: void std::vector<std::string, std::allocator<std::string> >::_M_range_initialize<boost::transform_iterator<boost::algorithm::detail::copy_iterator_rangeF<std::string, __gnu_cxx::__normal_iterator<char*, std::string> >, boost::algorithm::split_iterator<__gnu_cxx::__normal_iterator<char*, std::string> >, boost::use_default, boost::use_default> >(boost::transform_iterator<boost::algorithm::detail::copy_iterator_rangeF<std::string, __gnu_cxx::__normal_iterator<char*, std::string> >, boost::algorithm::split_iterator<__gnu_cxx::__normal_iterator<char*, std::string> >, boost::use_default, boost::use_default>, boost::transform_iterator<boost::algorithm::detail::copy_iterator_rangeF<std::string, __gnu_cxx::__normal_iterator<char*, std::string> >, boost::algorithm::split_iterator<__gnu_cxx::__normal_iterator<char*, std::string> >, boost::use_default, boost::use_default>, std::input_iterator_tag) (stl_vector.h:1188)
==24468==
==24468== LEAK SUMMARY:
==24468== definitely lost: 33,173,592 bytes in 4,146,699 blocks
==24468== indirectly lost: 2,554,208 bytes in 319,276 blocks
==24468== possibly lost: 316,143,946 bytes in 1,520,231 blocks
==24468== still reachable: 19,769,952 bytes in 254,937 blocks
==24468== suppressed: 0 bytes in 0 blocks
==24468== Reachable blocks (those to which a pointer was found) are not shown.
==24468== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==24468==
==24468== For counts of detected and suppressed errors, rerun with: -v
==24468== ERROR SUMMARY: 4146708 errors from 7 contexts (suppressed: 2 from 2)
Upvotes: 2
Views: 982
Reputation: 1023
For reference, this is what the final code looks like:
std::vector<chunk *> populate_chunks(std::string &text) {
std::vector<std::string> all_lines;
boost::split(all_lines, text, boost::is_any_of("\n"));
size_t num_lines = all_lines.size();
std::vector<chunk *> chunks = std::vector<chunk *>( (num_lines / START_CHUNK_SIZE) * 2 );
for(size_t next_line_num, line_num = 0; line_num < num_lines; line_num = next_line_num) {
next_line_num = line_num + START_CHUNK_SIZE;
chunk *c = new chunk;
chunks.push_back(c);
for(size_t i = 0; i < std::min(static_cast<unsigned int>(START_CHUNK_SIZE), static_cast<unsigned int>(num_lines - line_num) ); ++i) {
line *l = new line;
l->text = all_lines[line_num+i];
c->lines[i] = l;
}
}
return chunks;
}
Upvotes: 0
Reputation: 5316
The size of the lines
array in a chunk
is MAX_CHUNK_SIZE
, but you are accessing it far beyond that on any iteration except the first.
Your loop is for(size_t i = 0; i < next_line_num; ++i)
, guess what next_line_num
is on your second (and beyond) iteration?
You would probably have totally avoided this problem if you had thought of another problem, which you overlooked. You only partially fill the chunks (by 75%), which makes sense. But on the last iteration you are likely to have even less lines than those needed to fill 75% of a chunk. Therefore there should be, somewhere, a test to handle this boundary. A comparison, somewhere in that loop, with num_lines
. Thinking about where to put it could (but not necessarily would) have alerted you that the iteration index is not doing what you expect.
Try for(size_t i = 0; i < START_CHUNK_SIZE && line_num+i < num_lines; ++i)
.
Upvotes: 3