Reputation: 49
I try to read a dictionary file, in which each line contains word-id, word and frequency separated by a whitespace. The problem is that the map used to store words turned out have same value. I very appreciate if you can help me.
typedef struct{
int id;
int count;
char* word;
} WORD;
//read file
std::map<int, WORD*> readWordMap(char* file_name)
{
std::ifstream infile(file_name, std::ifstream::in);
std::cout<<"word map read file:"<<file_name<<std::endl;
if (! infile) {
std::cerr<<"oops! unable to open file "<<file_name<<std::endl;
exit(-1);
}
std::map<int, WORD*> map;
std::vector<std::string> tokens;
std::string line;
char word[100];
int size;
while (std::getline(infile, line)) {
size = (int)split(line, tokens, ' ');
WORD* entry = (WORD*) malloc(sizeof(WORD*));
entry->id = atoi(tokens[0].c_str());
entry->count = atoi(tokens[2].c_str());
strcpy(word, tokens[1].c_str());
entry->word = word;
map[entry->id] = entry;
std::cout<< entry->id<<" "<<entry->word<<" "<<entry->count<<std::endl;
}
infile.close();
std::cout<<map.size()<<std::endl;
std::map<int, WORD*>::const_iterator it;
for (it = map.begin(); it != map.end(); it++) {
std::cout<<(it->first)<<" "<<(it->second->word)<<std::endl;
}
return map;
}
//split string by a delimiter
size_t split(const std::string &txt, std::vector<std::string> &strs, char ch)
{
size_t pos = txt.find( ch );
size_t initialPos = 0;
strs.clear();
while( pos != std::string::npos ) {
strs.push_back( txt.substr( initialPos, pos - initialPos + 1 ) );
initialPos = pos + 1;
pos = txt.find( ch, initialPos );
}
strs.push_back( txt.substr( initialPos, std::min( pos, txt.size() ) - initialPos + 1 ) );
return strs.size();
}
Data file:
2 I 1
3 gave 1
4 him 1
5 the 3
6 book 3
7 . 3
8 He 2
9 read 1
10 loved 1
result:
2 I 1
3 gave 1
4 him 1
5 the 3
6 book 3
7 . 3
8 He 2
9 read 1
10 loved 1
map size:9
2 loved
3 loved
4 loved
5 loved
6 loved
7 loved
8 loved
9 loved
10 loved
Upvotes: 0
Views: 4371
Reputation: 5988
WORD* entry = (WORD*) malloc(sizeof(WORD*));
allocates a WORD pointer
not a whole WORD
struct.
The compiler keeps allocating entry put it is not initalized to anything (they are all pointing to some random address which doesnt even belong to your program possibly. ) and you add that pointer to the map repeatedly. So all firsts of your map are pointing to the same location (coincidentaly). It should be
WORD* entry = new WORD;
This is a cleaner way of doing it
struct WORD{
int id;
int count;
std::string word;
};
while (std::getline(infile, line)) {
WORD* entry = new WORD;
std::istringstream iss(line);
iss >> entry->id >> entry->word >> entry->count;
map[entry->id] = entry;
std::cout<< entry->id<<" "<<entry->word<<" "<<entry->count<<std::endl;
}
Upvotes: 1
Reputation: 56479
You forget to allocate memory for WORD::word
before strcpy
. And you are assigning the address of char word[100]
to all items of the map which is same for all of them.
And it's better to use std::string
instead of C-style strings. In addition you can use std::stoi
to convert strings to integers. Try this:
struct WORD{
int id;
int count;
std::string word;
};
std::map<int, WORD> readWordMap(const std::string &file_name)
{
...
std::map<int, WORD> map;
...
while (std::getline(infile, line)) {
...
WORD entry;
entry.id = std::stoi(tokens[0]);
entry.count = std::stoi(tokens[2]);
entry.word = tokens[1];
map[entry.id] = entry;
...
}
infile.close();
...
}
Upvotes: 1