ellieu
ellieu

Reputation: 11

Improve Time Efficiency of Driver Program

Sorry that the title is vague. Essentially I am trying to approve the time (and overall) efficiency of a C++ driver program which:

  1. Reads in a file line by line using ifstream

  2. It is vital to my program that the lines are processed seperately, so I currently have 4 seperate calls to getline.

  3. The program reads the string line into a vector of integers using string-stream.

  4. Finally, it converts the vector into to a linked list of integers. Is there a way or a function that can directly read the integers from the file into the ll of integers?

Here is the driver code:

int main(int argc, char *argv[])
{
    ifstream infile(argv[1]);

    vector<int> vals_add;
    vector<int> vals_remove;

    //Driver Code
    if(infile.is_open()){

        string line;
        int n;
        getline(infile, line);
        istringstream iss (line);


        getline(infile, line);
        istringstream iss2 (line);
        while (iss2 >> n){
            vals_add.push_back(n);
        }

        getline(infile, line);
        istringstream iss3 (line);

        getline(infile, line);
        istringstream iss4 (line);
        while (iss4 >> n){
            vals_remove.push_back(n);
        }


        int array_add[vals_add.size()];
        copy(vals_add.begin(), vals_add.end(), array_add);


        int array_remove[vals_remove.size()];
        copy(vals_remove.begin(), vals_remove.end(), array_remove);



        Node *ptr = CnvrtVectoList(array_add, sizeof(array_add)/sizeof(int));
        print(ptr);
        cout << "\n";

        for(int i = 0; i < vals_remove.size(); i++){
           deleteNode(&ptr, vals_remove[i]);
        }


        print(ptr);
        cout << "\n";

    }

Here is a small example input:

7

6 18 5 20 48 2 97

8

3 6 9 12 28 5 7 10

Where lines 2 and 4 MUST be processed as separate lists, and lines 1 and 3 are the size of the lists (they must dynamically allocate memory so the size must remain exact to the input).

Upvotes: 1

Views: 77

Answers (2)

First of all: why do you use some custom list data structure? It's very likely that it is half-baked, i.e. doesn't have support for allocators, and thus would be much harder to adapt to perform well. Just use std::list for a doubly-linked list, or std::forward_list for a singly-linked list. Easy.

There are several requirements that you seem to imply:

  1. The values of type T (for example: an int) are to be stored in a linked list - either std::list<T> or std::forward_list<T> (not a raw list of Nodes).

  2. The data shouldn't be unnecessarily copied - i.e. the memory blocks shouldn't be reallocated.

  3. The parsing should be parallelizable, although this makes sense only on fast data sources where the I/O won't dwarf CPU time.

The idea is then:

  1. Use a custom allocator to carve memory in contiguous segments that can store multiple list nodes.

  2. Parse the entire file into linked lists that uses the above allocator. The list will allocate memory segments on demand. A new list is started on each newline.

  3. Return the 2nd and 4th list (i.e. lists of elements in the 2nd and 4th line).

It's worth noting that the lines that contain element counts are unnecessary. Of course, that data could be passed to the allocator to pre-allocate enough memory segments, but this disallows parallelization, since parallel parsers don't know where the element counts are - these get found only after the parallel-parsed data is reconciled. Yes, with a small modification, this parsing can be completely parallelized. How cool is that!

Let's start simple and minimal: parse the file to produce two lists. The example below uses a std::istringstream over the internally generated text view of the dataset, but parse could also be passed a std::ifstream of course.

// https://github.com/KubaO/stackoverflown/tree/master/questions/linked-list-allocator-58100610
#include <forward_list>
#include <iostream>
#include <sstream>
#include <vector>

using element_type = int;

template <typename allocator> using list_type = std::forward_list<element_type, allocator>;

template <typename allocator>
std::vector<list_type<allocator>> parse(std::istream &in, allocator alloc)
{
   using list_t = list_type<allocator>;
   std::vector<list_t> lists;
   element_type el;
   list_t *list = {};
   do {
      in >> el;
      if (in.good()) {
         if (!list) list = &lists.emplace_back(alloc);
         list->push_front(std::move(el));
      }
      while (in.good()) {
         int c = in.get();
         if (!isspace(c)) {
            in.unget();
            break;
         }
         else if (c=='\n') list = {};
      }
   } while (in.good() && !in.eof());
   for (auto &list : lists) list.reverse();
   return lists;
}

And then, to test it:

const std::vector<std::vector<element_type>> test_data = {
   {6, 18, 5, 20, 48, 2, 97},
   {3, 6, 9, 12, 28, 5, 7, 10}
};

template <typename allocator = std::allocator<element_type>>
void test(const std::string &str, allocator alloc = {})
{
   std::istringstream input{str};
   auto lists = parse(input, alloc);
   assert(lists.size() == 4);
   lists.erase(lists.begin()+2); // remove the 3rd list
   lists.erase(lists.begin()+0); // remove the 1st list
   for (int i = 0; i < test_data.size(); i++)
      assert(std::equal(test_data[i].begin(), test_data[i].end(), lists[i].begin()));
}

std::string generate_input()
{
   std::stringstream s;
   for (auto &data : test_data) {
      s << data.size() << "\n";
      for (const element_type &el : data) s << el << " ";
      s << "\n";
   }
   return s.str();
}

Now, let's look at a custom allocator:

class segment_allocator_base
{
protected:
   static constexpr size_t segment_size = 128;
   using segment = std::vector<char>;
   struct free_node {
      free_node *next;
      free_node() = delete;
      free_node(const free_node &) = delete;
      free_node &operator=(const free_node &) = delete;
      free_node *stepped_by(size_t element_size, int n) const {
         auto *p = const_cast<free_node*>(this);
         return reinterpret_cast<free_node*>(reinterpret_cast<char*>(p) + (n * element_size));
      }
   };
   struct segment_store {
      size_t element_size;
      free_node *free = {};
      explicit segment_store(size_t element_size) : element_size(element_size) {}
      std::forward_list<segment> segments;
   };
   template <typename T> static constexpr size_t size_for() {
      constexpr size_t T_size = sizeof(T);
      constexpr size_t element_align = std::max(alignof(free_node), alignof(T));
      constexpr auto padding = T_size % element_align;
      return T_size + padding;
   }
   struct pimpl {
      std::vector<segment_store> stores;
      template <typename T> segment_store &store_for() {
         constexpr size_t element_size = size_for<T>();
         for (auto &s : stores)
            if (s.element_size == element_size) return s;
         return stores.emplace_back(element_size);
      }
   };
   std::shared_ptr<pimpl> dp{new pimpl};
};

template<typename T>
class segment_allocator : public segment_allocator_base
{
   segment_store *d = {};
   static constexpr size_t element_size = size_for<T>();
   static free_node *advanced(free_node *p, int n) { return p->stepped_by(element_size, n); }
   static free_node *&advance(free_node *&p, int n) { return (p = advanced(p, n)); }
   void mark_free(free_node *free_start, size_t n)
   {
      auto *p = free_start;
      for (; n; n--) p = (p->next = advanced(p, 1));
      advanced(p, -1)->next = d->free;
      d->free = free_start;
   }
public:
   using value_type = T;
   using pointer = T*;
   template <typename U> struct rebind {
      using other = segment_allocator<U>;
   };
   segment_allocator() : d(&dp->store_for<T>()) {}
   segment_allocator(segment_allocator &&o) = default;
   segment_allocator(const segment_allocator &o) = default;
   segment_allocator &operator=(const segment_allocator &o) {
      dp = o.dp;
      d = o.d;
      return *this;
   }
   template <typename U> segment_allocator(const segment_allocator<U> &o) :
      segment_allocator_base(o), d(&dp->store_for<T>()) {}
   pointer allocate(const size_t n) {
      if (n == 0) return {};
      if (d->free) {
         // look for a sufficiently long contiguous region
         auto **base_ref = &d->free;
         auto *base = *base_ref;
         do {
            auto *p = base;
            for (auto need = n; need; need--) {
               auto *const prev = p;
               auto *const next = prev->next;
               advance(p, 1);
               if (need > 1 && next != p) {
                  base_ref = &(prev->next);
                  base = next;
                  break;
               } else if (need == 1) {
                  *base_ref = next; // remove this region from the free list
                  return reinterpret_cast<pointer>(base);
               }
            }
         } while (base);
      }
      // generate a new segment, guaranteed to contain enough space
      size_t count = std::max(n, segment_size);
      auto &segment = d->segments.emplace_front(count);
      auto *const start = reinterpret_cast<free_node*>(segment.data());
      if (count > n)
         mark_free(advanced(start, n), count - n);
      else
         d->free = nullptr;
      return reinterpret_cast<pointer>(start);
   }
   void deallocate(pointer ptr, std::size_t n) {
      mark_free(reinterpret_cast<free_node*>(ptr), n);
   }

   using propagate_on_container_copy_assignment = std::true_type;
   using propagate_on_container_move_assignment = std::true_type;
};

For the little test data we've got, the allocator will only allocate a segment... once!

To test:

int main()
{  
   auto test_input_str = generate_input();
   std::cout << test_input_str << std::endl;
   test(test_input_str);
   test<segment_allocator<element_type>>(test_input_str);
   return 0;
}

Parallelization would leverage the allocator above, starting multiple threads and in each invoking parse on its own allocator, each parser starting at a different point in the file. When the parsing is done, the allocators would have to merge their segment lists, so that they'd compare equal. At that point, the linked lists could be combined using usual methods. Other than thread startup overhead, the parallelization would have negligible overhead, and there'd be no data copying involved to combine the data post-parallelization. But I leave this exercise to the reader.

Upvotes: 0

Konrad Rudolph
Konrad Rudolph

Reputation: 545985

There are multiple points that can be improved.

First off, remove unnecessary code: you’re not using iss and iss3. Next, your array_add and array_remove seem to be redundant. Use the vectors directly.

If you have a rough idea of how many values you’ll read on average, reserve space in the vectors to avoid repeated resizing and copying (actually you seem to have these numbers in your input; use this information instead of throwing it away!). You can also replace your while reading loops with std::copy and std::istream_iterators.

You haven’t shown how CnvrtVectoList is implemented but in general linked lists aren’t particularly efficient to work with due to lack of locality: they throw data all over the heap. Contiguous containers (= vectors) are almost always more efficient, even when you need to remove elements in the middle. Try using a vector instead and time the performance carefully.

Lastly, can you sort the values? If so, then you can implement the deletion of values a lot more efficiently using iterative calls to std::lower_bound, or a single call to std::set_difference.

If (and only if!) the overhead is actually in the reading of the numbers from a file, restructure your IO code and don’t read lines separately (that way you’ll avoid many redundant allocations). Instead, scan directly through the input file (optionally using a buffer or memory mapping) and manually keep track of how many newline characters you’ve encountered. You can then use the strtod family of functions to scan numbers from the input read buffer.

Or, if you can assume that the input is correct, you can avoid reading separate lines by using the information provided in the file:

int add_num;
infile >> add_num;
std::copy_n(std::istream_iterator<int>(infile), std::inserter(your_list, std::end(your_list));

int del_num;
infile >> del_num;
std::vector<int> to_delete(del_num);
std::copy_n(std::istream_iterator<int>(infile), del_num, to_delete.begin());
for (auto const n : del_num) {
    deleteNode(&ptr, n);
}

Upvotes: 1

Related Questions