Nick Yang
Nick Yang

Reputation: 31

How can I load data from .txt efficiently in C++?

I am currently using fstream to load data, which is 7.1GB, with C++. The .txt file goes like this:

 item1  2.87  4.64  ... 
 item2  5.89  9.24  ... 
 ...     ...   ...  ... 

It has 300000 rows and 201 columns (1 column for the item name and 200 for its weights) and each cell has a double type number. What I do now is like this:

ifstream click_log(R"(1.txt)", ifstream::in);
string line;
unordered_map<string, vector<double>> dict;
while (getline(click_log, line)){
    istringstream record(line);
    string key;
    vector<double> weights;
    double weight;
    record >> key;
    while (record >> weight){
        weights.push_back(weight);
    }
    dict[key] = weights;
}

However, it takes my computer (AMD 3700X, 8 cores) about 30 minutes to load the file completely. Is it slow because its O(m*n) complexity or maybe simply the fact that converting string to double is slow? What is the most efficient way to load data from .txt?

Upvotes: 3

Views: 200

Answers (1)

Fareanor
Fareanor

Reputation: 6805

You should not recreate your variables at each loop iteration. Create them once and for all, then you can reassign them when needed.

If you want to use std::vector instead of std::array<double, 200>, then you should reserve(200) all of your vectors in order to avoid a lot of reallocations/copies/deallocations due to std::vector's machinery.

You can do the same for your std::unordered_map.

Finally, write your data directly into the target container, you don't need to use that much of temporaries (it would remove the huge overhead caused by all these unnecessary copies).

I have rewritten your code with taking these guidelines into account. I bet it would increase your performances:

int main()
{
    std::ifstream ifs("..\\tests\\data\\some_data.txt"); // Replace with your file
    if(!ifs)
        return -1;
    
    std::unordered_map<std::string, std::array<double, 200>> dict;
    dict.reserve(300000);
    
    std::string line;
    std::string key;
    double weight;
    std::size_t i;
    
    while(getline(ifs, line))
    {
        std::istringstream record(line);
        i = 0;
    
        record >> key;
    
        while(record >> weight)
        {
            dict[key].at(i++) = weight;
        }
    }

    ifs.close();

    // The whole file is loaded

    return 0;
}

Of course, I don't claim this to be the most efficient way to do it. I'm sure we can bring more improvements that I didn't thought of at this very moment.

Anyway, keep in mind that you will still probably have a bottleneck with hard drive access, IO operations,...

Upvotes: 2

Related Questions