Reputation: 281
I have a file in following format:
[1]
Parameter1=Value1
.
.
.
End
[2]
.
.
The number between brackets presents id of the entity. There're around 4500 entites. I need to parse through all entites and pick the ones matching my parameters and values. Size of file is around 20mb. My first approach was to reading file line by line and storing them in a struct array like:
struct Component{
std::string parameter;
std::string value;
};
struct Entity{
std::string id;
std::list<Component> components;
};
std::list<Entity> g_entities;
But this approach took enormous amount of memory and was very slow. I've also tried storing only the ones that match my parameters/values. But that also was really slow and took quite some memory. Ideally i would like to store all data in memory so that i won't have to load the file everytime i need to filter my parameters/values if it's possible with reasonable amount of memory usage.
Edit 1: I read file line by line:
std::ifstream readTemp(filePath);
std::stringstream dataStream;
dataStream << readTemp.rdbuf();
readTemp.close();
while (std::getline(dataStream, line)){
if (line.find('[') != std::string::npos){
// Create Entity
Entity entity;
// Set entity id
entity.id = line.substr(line.find('[') + 1, line.find(']') - 1);
// Read all lines until EnumEnd=0
while (1){
std::getline(dataStream, line);
// Break loop if end of entity
if (line.find("EnumEnd=0") != std::string::npos){
if (CheckMatch(entity))
entities.push_back(entity);
entity.components.clear();
break;
}
Component comp;
int pos_eq = line.find('=');
comp.parameterId = line.substr(0, pos_eq);
comp.value = line.substr(pos_eq + 1);
entity.components.push_back(comp);
}
}
}
Upvotes: 2
Views: 184
Reputation: 7448
PS: After your edit. and Comment concerning memory consumption
500MB / 20MB = 25.
If each line is 25 chars long, the memory consumption looks ok.
OK you could use a look-up table for mapping parameter-names to numbers. If the names-set is small, this will save the consumption up to 2 times.
Your data structure could look like this:
std::map<int, std::map<int, std::string> > my_ini_file_data;
std::map<std::string, int> param_to_idx;
(provided the parameter names within sections (entities as you call it) are not unique)
Putting the data is:
std::string param = "Param";
std::string value = "Val";
int entity_id = 0;
if ( param_to_idx.find(param) == param_to_idx.end() )
param_to_idx[param] = param_to_idx.size();
my_ini_file_data[entity_id][ param_to_idx[param] ] = value;
getting the data is:
value = my_ini_file_data[entity_id][ param_to_idx[param] ];
If the values-set is also considerably smaller than the number of entries, you could even map values to numbers:
std::map<int, std::map<int, int> > my_ini_file_data;
std::map<std::string, int> param_to_idx;
std::map<std::string, int> value_to_idx;
std::map<int, std::string> idx_to_value;
Putting the data is:
std::string param = "Param";
std::string value = "Val";
int entity_id = 0;
if ( param_to_idx.find(param) == param_to_idx.end() )
param_to_idx[param] = param_to_idx.size();
if ( value_to_idx.find(value) == value_to_idx.end() )
{
int idx = value_to_idx.size();
value_to_idx[value] = idx;
idx_to_value[idx] = value;
}
my_ini_file_data[entity_id][ param_to_idx[param] ] = value_to_idx[value];
getting the data is:
value = idx_to_value[my_ini_file_data[entity_id][ param_to_idx[param] ] ];
Hope, this helps.
Initial answer
Concerning memory, I wouldn't care unless you have a kind of embedded system with very small memory.
Concerning the speed, I could give you some suggestions:
Find out, what is the bottleneck.
If the speed of the simple reading is OK, optimize your parsing code. You can reduce the number of find-calls by storing the position. e.g.
int pos_begin = line.find('[]');
if (pos_begin != std::string::npos){
int pos_end = line.find(']');
if (pos_end != std::string::npos){
entity.id = line.substr(pos_begin + 1, pos_begin - 1);
// Read all lines until EnumEnd=0
while (1){
std::getline(readTemp, line);
// Break loop if end of entity
if (line.find("EnumEnd=0") != std::string::npos){
if (CheckMatch(entity))
entities.push_back(entity);
break;
}
Component comp;
int pos_eq = line.find('=');
comp.parameter= line.substr(0, pos_eq);
comp.value = line.substr(pos_eq + 1);
entity.components.push_back(comp);
}
}
}
Upvotes: 3
Reputation: 18633
You can use less memory by interning your params and values, so as not to store multiple copies of them.
You could have a map of strings to unique numeric IDs, that you create when loading the file, and then just use the IDs when querying your data structure. At the expense of possibly slower parsing initially, working with these structures afterwards should be faster, as you'd only be matching 32-bit integers rather than comparing strings.
Sketchy proof of concept for storing each string once:
#include <unordered_map>
#include <string>
#include <iostream>
using namespace std;
int string_id(const string& s) {
static unordered_map<string, int> m;
static int id = 0;
auto it = m.find(s);
if (it == m.end()) {
m[s] = ++id;
return id;
} else {
return it->second;
}
}
int main() {
// prints 1 2 2 1
cout << string_id("hello") << " ";
cout << string_id("world") << " ";
cout << string_id("world") << " ";
cout << string_id("hello") << endl;
}
The unordered_map
will end up storing each string once, so you're set for memory. Depending on your matching function, you can define
struct Component {
int parameter;
int value;
};
and then your matching can be something like myComponent.parameter == string_id("some_key")
or even myComponent.parameter == some_stored_string_id
. If you want your strings back, you'll need a reverse mapping as well.
Upvotes: 2