Rud48
Rud48

Reputation: 1059

Why is my data garbled using ranges::to<std::vector>?

I am working with the ranges library and pipelines to read CSV files. My code works okay except when I add ranges::to<std::vector> at the end of the pipeline. With the 'to' added, the last line of the data is repeated and garbled.

The correct output is:

"Values to Analyze, Current",   1,  7,  2025,   5,  218,    11, 23, 119 
"Values to Analyse, Current",   1,  8,  2025,   21, 2055,   1,  7,  131 
"Values to Analyse, Current",   1,  9,  2025,   29, 210,    3,  4,  105 
"Values to Analyse, Current",   1,  10, 2025,   5,  214,    4,  7,  124 

The garbled output is:

"Values to Analyse, Current",   1,  10  ,2025   ,5  ,214    ,4, 7,1 24  
"Values to Analyse, Current",   1,  10  ,2025   ,5, 214,4   ,7  ,1  24  
"Values to Analyse, Current",   1,  10  ,2025   ,5, 214,    4,  7,  124 
"Values to Analyse, Current",   1,  10, 2025,   5,  214,    4,  7,  124 

Here is the code. It is long, but I couldn't work up a shorter example. There is also a link to it on Compiler Explorer,

#include <fstream>
#include <ranges>
#include <spanstream>
#include <vector>
using namespace std::literals;

namespace rng = std::ranges;
namespace vws = std::views;
using vws::chunk_by, vws::transform;

#include <fmt/ranges.h>
using fmt::println, fmt::print;

inline auto to_vec{rng::to<std::vector>()};
//----------------------------------------------------------------------------------------------
struct Line { // @formatter:off
   std::string mLine;
   friend void operator>>(std::istream& is, Line& l) { std::getline(is, l.mLine); }
}; // @formatter:on
//--------------------------------------------------------------------------------------------------
void print_data(auto&& data) {
   for (auto const& line : data) {
      for (auto const& cnk : line) {
         for (auto const& elem : cnk) {
            print("{}", elem);
         }
         print("\t");
      }
      println("");
   }
}
//--------------------------------------------------------------------------------------------------
auto parse_csv = transform([](auto&& line) { //
      static bool is_quoting{false};
      return line.mLine //
             | chunk_by([](char const lhs, char const) {
                if (lhs == '"') {
                   is_quoting = !is_quoting;
                }
                return (lhs != ',' or is_quoting);
             }) //
             | to_vec;
   });
//--------------------------------------------------------------------------------------------------
constexpr std::string_view test_data{ // @formatter:off
   R"("Values to Analyze, Current",1,7,2025,5,218,11,23,119
"Values to Analyse, Current",1,8,2025,21,2055,1,7,131
"Values to Analyse, Current",1,9,2025,29,210,3,4,105
"Values to Analyse, Current",1,10,2025,5,214,4,7,124)"
};//  @formatter:on
//--------------------------------------------------------------------------------------------------
auto main() -> int {

std::ispanstream csv_chars(test_data);
auto data{vws::istream<Line>(csv_chars)};
auto csv_data{data | parse_csv};


#if 0    // works okay
   print_data(csv_data);
#elif 1  // why does the vector garble output?
   print_data(csv_data | to_vec);
#else    // transform deleted - why does it work above?
   println("{}", fmt::join(csv_data, " "));
#endif

   return 0;
}

Upvotes: 1

Views: 90

Answers (2)

Rud48
Rud48

Reputation: 1059

Update to remove to_vec at end of parse_csv. Explanation below.

Here is a modified parse_csv that works as desired:

auto parse_csv =               //
   transform([](auto&& line) { //
         bool is_quoting{false};

         return line.mLine //
                | chunk_by([&is_quoting](char const lhs, char const) {
                   if (lhs == '"') {
                      is_quoting = !is_quoting;
                   }
                   return (lhs != ',' or is_quoting);
                })                                     //
                | rng::to<std::vector<std::string>>(); //
      }); // | to_vec;

The change of to_vec to rng::to<std::vector<std::string>>() stopped the garbled output. This change created a vector of strings instead of an iterator to the subranges.

With that resolved, the question about deleted transform at the call to fmt::join(csv_data, "\n") was fixed by adding the to_vec at the end of csv_data.

Update: Ignore that last paragraph. The to_vec is only needed with the join: fmt::join(csv_data | to_vec, "\n"). I realized that reading a large file is done best line by line instead of loading the file into a giant vector. That required the removal of to_vec. That led to the solution of adding it to the join.

Thanks to the comments, I resolved the issue.

Upvotes: -1

Sir Nate
Sir Nate

Reputation: 399

The printout ends up as garbage because you are printing sub-ranges from destructed strings after you apply to_vec.

Borrowing some code from here to print the type-names:

   std::ispanstream csv_chars(test_data);
   auto data{vws::istream<Line>(csv_chars)};
   auto csv_data{data | parse_csv};
   println(get_name<decltype(csv_data)>());

   std::ispanstream csv_chars2(test_data);
   auto data2{vws::istream<Line>(csv_chars2)};
   auto csv_data2{data2 | parse_csv | to_vec};
   println(get_name<decltype(csv_data2)>());

you can see that the types are

std::ranges::transform_view<std::ranges::basic_istream_view<Line, char>, (lambda at /app/example.cpp:33:28)>
std::vector<std::vector<std::ranges::subrange<__gnu_cxx::__normal_iterator<char *, std::basic_string<char>>>>>

Note that the second one, with the to_vec, is not a vector<vector<string>> but is instead a vector<vector<subrange< string-iterator >>>. In particular, these are iterators over the temporary Line objects' std::strings that are destroyed after the vector is created (rather than being kept alive without the to_vec because it is being continuously streamed, if you will, for the printing)

Upvotes: 3

Related Questions