user9118941
user9118941

Reputation: 11

Taking large datas from txt and sort them

I'm trying to create a C++ project which takes filenames from a txt file and count them and make a top 10 list of it. A small piece of input has shown below:

local - - [24/Oct/1994:13:41:41 -0600] "GET index.html HTTP/1.0" 200 150   
local - - [24/Oct/1994:13:41:41 -0600] "GET 1.gif HTTP/1.0" 200 1210  
local - - [24/Oct/1994:13:43:13 -0600] "GET index.html HTTP/``1.0" 200 3185  
local - - [24/Oct/1994:13:43:14 -0600] "GET 2.gif HTTP/1.0" 200 2555         
local - - [24/Oct/1994:13:43:15 -0600] "GET 3.gif HTTP/1.0" 200 36403   
local - - [24/Oct/1994:13:43:17 -0600] "GET 4.gif HTTP/1.0" 200 441    
local - - [24/Oct/1994:13:46:45 -0600] "GET index.html HTTP/1.0" 200 31853

The code I'm trying to do is below:

#include <iostream>
#include <fstream>
#include <sstream>
#include <unordered_map>
#include <vector>
#include <iterator>
#include <algorithm>
#include <functional>


std::string get_file_name(const std::string& s) {
    std::size_t first = s.find_first_of("\"");
    std::size_t last = s.find_last_of("\"");

    std::string request = s.substr(first, first - last);

    std::size_t file_begin = request.find_first_of(' ');
    std::string truncated_request = request.substr(++file_begin);

    std::size_t file_end = truncated_request.find(' ');
    std::string file_name = truncated_request.substr(0, file_end);

    return file_name;
}



int main() {

    std::ifstream f_s("text.txt");
    std::string content;
    std::unordered_map<std::string,long int> file_access_counts;

    while (std::getline(f_s, content)) {
        auto file_name = get_file_name(content);
        auto item = file_access_counts.find(file_name);

        if (item != file_access_counts.end()) {
            ++file_access_counts.at(file_name);
        }
        else {
            file_access_counts.insert(std::make_pair(file_name, 1));
        }
    }

    f_s.close();

    std::ofstream ofs;
    ofs.open("all.txt", std::ofstream::out | std::ofstream::app);

    for (auto& n : file_access_counts)
        ofs << n.first << ", " << n.second << std::endl;

    std::ifstream file("all.txt");
    std::vector<std::string> rows;

    while (!file.eof())
    {
        std::string line;
        std::getline(file, line);
        rows.push_back(line);
    }

    std::sort(rows.begin(), rows.end());
    std::vector<std::string>::iterator iterator = rows.begin();
    for (; iterator != rows.end(); ++iterator)
        std::cout << *iterator << std::endl;

    getchar();


    return 0;
}

When i executed, it shows me file names and how many times it repeated but not from highest to lowest and I don't think that it will work with large datas (like 50000 datas). Can you help me? Thank you.

Upvotes: 1

Views: 53

Answers (1)

Benjamin Cuningham
Benjamin Cuningham

Reputation: 886

The contents of all.txt are being sorted after being read back in. The problem is that the count is at the end of the line and therefor only affects the sort after the name.

all.txt:

3.gif, 1
index.html, 3
1.gif, 1
2.gif, 1
4.gif, 1

rows vector after sort:

1.gif, 1
2.gif, 1
3.gif, 1
4.gif, 1
index.html, 3

Either change the way the values are being written to all.txt, or parse the count before sorting.

If you put the count at the beginning of the line, be sure to pad with zeros so 3 comes after 10.

Upvotes: 1

Related Questions