Reputation: 848
I have a python code base that sometimes invokes C++ programs to handle intensive workloads. One such code has to count all kmers of a certain size in a large text file. For each line it reads, it creates a temporary index that stores the position of each kmer. Here is the function that processes each line:
void process_read(char* read, int num) {
int l = strlen(read) ;
std::string seq(read) ;
// index kmers
std::unordered_map<std::string, std::vector<int>> index ;
for (int i = 0 ; i <= l - 1 - 15 ; i++) {
std::string k = seq.substr(i, 15) ;
if (global_index->find(k) == global_index->end()) {
continue ;
}
if (index.find(k) == index.end()) {
index.insert(std::make_pair(k, std::vector<int>(1, i))) ;
} else {
index[k].push_back(i) ;
}
}
// 50+ lines of code commented out. It returns here
}
The code crashes every time it reaches a certain line of input:
ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAAACCATAACCCTAAACCTCACGATAACCCAAACCATCACCAAAAAAAAAAAAAACACACCTACCGAAACCAACAACATA
Out of the kmers in this line, only AAAAAAAAAAAAAAC
and CAAAAAAAAAAAAAA
make it to index. The code always crashes when trying to insert CAAAAAAAAAAAAAA
for some reason I don't understand. I guess is a problem with these keys being inserted into the unordered_map
in sequence. Changing the function to this will still result in the same crash when inserting the second key:
void process_read(char* read, int num) {
std::unordered_map<std::string, std::vector<int>> index ;
index.insert(std::make_pair("AAAAAAAAAAAAAAC", std::vector<int>(1, 2))) ;
index.insert(std::make_pair("CAAAAAAAAAAAAAA", std::vector<int>(1, 2))) ;
}
Now this function is clearly not accessing any global state unlike the original one so the problem has to be with these specific keys (notice that one is a circular shift of the other, the hash function used might not be comfortable with that); however, putting this code at the start of the program or writing another small program that only does this doesn't seem to reproduce the crash so I'm really confused.
Any suggestion is appreciated.
Update: I get this stack trace during the crash. For reasons, I can't use gdb to debug so I guess this is the best I'm going to get. But don't know how to interpret it.
*** Error in `src/python/kmer/c_counter.out': malloc(): memory corruption (fast): 0x0000000001eac690 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f189daa47e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x82651)[0x7f189daaf651]
/lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x54)[0x7f189dab1184]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(_Znwm+0x18)[0x7f189e3a3e78]
/src/python/kmer/c_counter.out[0x41c5e4]
/src/python/kmer/c_counter.out[0x4146ea]
/src/python/kmer/c_counter.out[0x41453a]
/src/python/kmer/c_counter.out[0x41035b]
/src/python/kmer/c_counter.out[0x40b3d8]
/src/python/kmer/c_counter.out[0x40940a]
/src/python/kmer/c_counter.out[0x404528]
/src/python/kmer/c_counter.out[0x404f9d]
/src/python/kmer/c_counter.out[0x405f42]
/src/python/kmer/c_counter.out[0x4063d6]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f189da4d830]
/src/python/kmer/c_counter.out[0x403b39]
======= Memory map: ========
00400000-00440000 r-xp 00000000 00:2f 546796 src/python/kmer/c_counter.out
0063f000-00640000 rw-p 0003f000 00:2f 546796 src/python/kmer/c_counter.out
014a0000-01ebf000 rw-p 00000000 00:00 0 [heap]
7f1898000000-7f1898021000 rw-p 00000000 00:00 0
7f1898021000-7f189c000000 ---p 00000000 00:00 0
7f189da2d000-7f189dbed000 r-xp 00000000 fc:00 1439150 /lib/x86_64-linux-gnu/libc-2.23.so
7f189dbed000-7f189dded000 ---p 001c0000 fc:00 1439150 /lib/x86_64-linux-gnu/libc-2.23.so
7f189dded000-7f189ddf1000 r--p 001c0000 fc:00 1439150 /lib/x86_64-linux-gnu/libc-2.23.so
7f189ddf1000-7f189ddf3000 rw-p 001c4000 fc:00 1439150 /lib/x86_64-linux-gnu/libc-2.23.so
7f189ddf3000-7f189ddf7000 rw-p 00000000 00:00 0
7f189ddf7000-7f189de0d000 r-xp 00000000 fc:00 1439041 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f189de0d000-7f189e00c000 ---p 00016000 fc:00 1439041 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f189e00c000-7f189e00d000 rw-p 00015000 fc:00 1439041 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f189e00d000-7f189e115000 r-xp 00000000 fc:00 1439141 /lib/x86_64-linux-gnu/libm-2.23.so
7f189e115000-7f189e314000 ---p 00108000 fc:00 1439141 /lib/x86_64-linux-gnu/libm-2.23.so
7f189e314000-7f189e315000 r--p 00107000 fc:00 1439141 /lib/x86_64-linux-gnu/libm-2.23.so
7f189e315000-7f189e316000 rw-p 00108000 fc:00 1439141 /lib/x86_64-linux-gnu/libm-2.23.so
7f189e316000-7f189e488000 r-xp 00000000 fc:00 671990 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
7f189e488000-7f189e688000 ---p 00172000 fc:00 671990 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
7f189e688000-7f189e692000 r--p 00172000 fc:00 671990 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
7f189e692000-7f189e694000 rw-p 0017c000 fc:00 671990 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
7f189e694000-7f189e698000 rw-p 00000000 00:00 0
7f189e698000-7f189e6be000 r-xp 00000000 fc:00 1439146 /lib/x86_64-linux-gnu/ld-2.23.so
7f189e878000-7f189e89f000 rw-p 00000000 00:00 0
7f189e8bc000-7f189e8bd000 rw-p 00000000 00:00 0
7f189e8bd000-7f189e8be000 r--p 00025000 fc:00 1439146 /lib/x86_64-linux-gnu/ld-2.23.so
7f189e8be000-7f189e8bf000 rw-p 00026000 fc:00 1439146 /lib/x86_64-linux-gnu/ld-2.23.so
7f189e8bf000-7f189e8c0000 rw-p 00000000 00:00 0
7ffea4907000-7ffea4929000 rw-p 00000000 00:00 0 [stack]
7ffea49b6000-7ffea49b9000 r--p 00000000 00:00 0 [vvar]
7ffea49b9000-7ffea49bb000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Upvotes: 2
Views: 2255
Reputation: 23711
Changing the function to this will still result in the same crash when inserting the second key:
[...]
putting this code at the start of the program or writing another small program that only does this doesn't seem to reproduce the crash so I'm really confused.
std::unordered_map
has no relevant global state that could change between "everything is fine if I run this test function at the start" and "if I run this test function later, the map crashes". You have memory corruption due to undefined behavior somewhere else in your program - the observations you made are the strongest proof you could get for that.
Upvotes: 2
Reputation: 11968
The function signature suggests that you don't necessarily have a null ended string(one that has a \0 as the last character).
But you don't treat it as such(you should use variants that take num as a parameter). I suspect this pattern repeats in other places and at some point you corrupt your memory.
If I were you i would build the binary with Valgrind or some other memory analysis tool and run it again. It will catch the incorrect access where it happens.
The way you use unordered_map looks fine to me.
Upvotes: 0