Kernel
Kernel

Reputation: 115

Reading word by word from a file with operator >>

A few hours ago I asked how I could read from a file with a specific format so I could program operator >> The format of the file was:

Salad;Tomatoe 50;Fresh lettuce 100;Potatoe 60;Onion 10
Macaroni;Macaroni 250;Tomatoe 60;Oil 10
Fish and chips;fish 30;potatoe 30;Oil 40

And I have the following class:

...
#include <list> //I'm using list of the STL
....
class recipe{
 private:
   list<pair<string,unsigned int>> ing; //A list of the ingredients of one recipe. String for the name of the ingredient and unsigned int for the quantity of each ingredient
 public:
  ....

 //My solution for operator >>

istream & operator >> (istream &i, recipe &other){
   string line, data, name_ing;
   string code, nombre;
   unsigned int plate, quantity;
   list<pair<string,unsigned int>> ings;

   getline(i,line);

   stringstream s (line);

   getline(s,data,';');
   code = data;
   getline(s,data,';');
   plate = atoi(data.c_str());
   getline(s,data,';');
   name = data;

   while(getline(s,data,' ')){
     name_ing = data;

    getline(s,data,';');
    quantity = atoi(data.c_str());

    pair<string,unsigned int> ingredient;
    ingredient.first = name_ing;
    ingredient.second = quantity;

    ings.push_back(ingredient);   
}

   recipe a_recipe(code,plate,name,0,0,0,0,0,ings);
   oher = a_recipe;

   return i;
}

So now I have another problem, I don't know how to read those ingredients that are composed by two words, for example: "fresh lettuce 50", because the output would be:

 Salad;Tomatoe 50;Fresh 0;Potatoe 60;Onion 10

It doesn't read Lettuce and the quantity. Any help?

Upvotes: 1

Views: 100

Answers (2)

A M
A M

Reputation: 15265

As already written:

To solve the problem at hand there is a more or less standard approach. You want to read csv data.

In your case, it is a little bit more difficult, because you do have nested csv data. So first a ";" separated list and then a space separated list. The 2nd one is a little bit unprecise, because our ingredients coud have 2 spaces before the quantity, like in "Red pepper 2"

Now, how could this to be done? C++ is an object oriented language. You can create objects, consisting of data and member functions that operate on this data. We will define a class "Recipe" and overwrite the inserter and extractor operator. Because the class and only the class should know how this works. Having done that, input and output becomes easy.

The extractor, and that is the core of the question is, as said, a little bit more tricky. How can this be done?

In the extractor we will first read a complete line from an std::istream using the function std::getline. After having the line, we see a std::string containing "data-fields", delimited by a semicolon. The std::string needs to be split up and the "data-fields"-contents shall be stored. Additionally you need to split the ingredients.

The process of splitting up strings is also called tokenizing. The "data-fields"-content is also called "token". C++ has a standard function for this purpose: std::sregex_token_iterator.

And because we have something that has been designed for such purpose, we should use it.

This thing is an iterator. For iterating over a string, hence sregex. The begin part defines, on what range of input we shall operate, then there is a std::regex for what should be matched / or what should not be matched in the input string. The type of matching strategy is given with last parameter.

1 --> give me the stuff that I defined in the regex and
-1 --> give me that what is NOT matched based on the regex.

We can use this iterator for storing the tokens in a std::vector. The std::vector has a range constructor, which takes 2 iterators a parameter, and copies the data between the first iterator and 2nd iterator to the std::vector.

The statement

std::vector token(std::sregex_token_iterator(line.begin(), line.end(), separator, -1), {});

defines a variable "token" of type std::vector<std::string>, splits up the std::string and puts the tokens into the std::vector. After having the data in the std::vector, we will copy it to the data members of our class.

For the 2nd split we create 2 simple lambdas and copy the data into the ingredients list.

Very simple.

Next step. We want to read from a file. The file conatins also some kind of same data. The same data are rows.

And as for above, we can iterate over similar data. If it is the file input or whatever. For this purpose C++ has the std::istream_iterator. This is a template and as a template parameter it gets the type of data that it should read and, as a constructor parameter, it gets a reference to an input stream. It doesnt't matter, if the input stream is a std::cin, or a std::ifstream or a std::istringstream. The behaviour is identical for all kinds of streams.

And since we do not have files an SO, I use (in the below example) a std::istringstream to store the input csv file. But of course you can open a file, by defining a std::ifstream csvFile(filename). No problem.

We can now read the complete csv-file and split it into tokens and get all data, by simply defining a new variable and use again the range constructor.

std::vector cookBook(std::istream_iterator<Recipe>(sourceFile), {});

This very simple one-liner will read the complete csv-file and do all the expected work.

Please note: I am using C++17 and can define the std::vector without template argument. The compiler can deduce the argument from the given function parameters. This feature is called CTAD ("class template argument deduction").

Additionally, you can see that I do not use the "end()"-iterator explicitely.

This iterator will be constructed from the empty brace-enclosed initializer list with the correct type, because it will be deduced to be the same as the type of the first argument due to the std::vector constructor requiring that.

Ì hope I could answer your basic question. Please see the full blown C++ example below:

#include <iostream>
#include <regex>
#include <string>
#include <list>
#include <vector>
#include <iterator>
#include <sstream>

// Data types for ingredients and quantity
using Ingredients = std::pair<std::string, int>;

// Some helper functions
auto trim = [](const std::string & s) { return std::regex_replace(s, std::regex("^ +| +$"), "$1"); };
auto split = [](const std::string & s) {size_t pos{ s.rfind(' ') }; return Ingredients(s.substr(0, pos), std::stoi(s.substr(pos))); };

std::regex separator{ ";" };

// Our recipe class
struct Recipe {
    // data
    std::string title{};
    std::list<Ingredients> ingredients{};

    // Overwrite extractor
    friend std::istream& operator >> (std::istream& is, Recipe& r) {

        // We will read one line into this temproary
        std::string line{};
        if (std::getline(is, line)) {
            // Tokenize the base string
            std::vector token(std::sregex_token_iterator(line.begin(), line.end(), separator, -1), {});
            // get the recipe title
            r.title = token[0];
            // And, get the ingredients
            r.ingredients.clear();
            std::transform(std::next(token.begin()), token.end(), std::back_inserter(r.ingredients), 
                [](const std::string& s) { return split(trim(s)); });
        }
        return is;
    }

    // Overwrite inserter
    friend std::ostream& operator << (std::ostream& os, const Recipe& r) {
        // Print one recipe
        os << "---- Recipe: " << r.title << "\n-- Ingredients:\n\n";
        for (const auto& [ingredient, quantity] : r.ingredients) 
            os << ingredient << " --> " << quantity << "\n";
        return os;
    }
};

// Source file with CSV data. I added "Red Pepper 2" to Salad
std::istringstream sourceFile{ R"(Salad;Tomatoe 50;Lettuce 100;Potatoe 60;Red Pepper 2;Onion 10
Macaroni;Macaroni 250;Tomatoe 60;Oil 10
Fish and chips;fish 30;potatoe 30;Oil 40)" };

int main() {
    // Read all data from the file with the following one-liner
    std::vector cookBook(std::istream_iterator<Recipe>(sourceFile), {});

    // Show some debug output
    std::copy(cookBook.begin(), cookBook.end(), std::ostream_iterator<Recipe>(std::cout, "\n"));
    return 0;
}

Again: What a pity that nobody will read this . . .

Upvotes: 1

Ted Lyngmo
Ted Lyngmo

Reputation: 117871

I suggest that you make a type out of the ingredient and amount part instead of using std::pair<std::string, unsigned>. With that you can add stream operators for that type too (and not risk it being used by a different std::pair<std::string, unsigned> than the one you want to support). It breaks the problem down somewhat and makes it simpler to implement / understand.

That being that said, I suggest that you use something else than a space as a delimiter between the ingredient name and the amount since that complicates the parsing (as you can see in the code).

Here's an example with comments:

#include <cstdlib>
#include <iostream>
#include <list>
#include <sstream>
#include <string>
#include <tuple>

// a simple ingredient type
struct ingredient {
    std::string name{};
    unsigned amount{};
};

// read an ingredient "<name> <amount>"
std::istream& operator>>(std::istream& is, ingredient& i) {
    std::string entry;
    if(std::getline(is, entry, ';')) { // read until ; or EOL

        // find the last space in "entry"
        if(size_t pos = entry.rfind(' '); pos != std::string::npos) {

            // extract the trailing amount
            if(unsigned am = static_cast<unsigned>(
                   // Create a substring from the last space+1 and convert it to an
                   // unsigned (long). The static_cast<unsigned> silences a warning about
                   // the possibility to get the wrong value if it happens to be larger
                   // than an unsigned can hold.
                   std::strtoul(entry.substr(pos + 1).c_str(), nullptr, 10));
               // and check that we extracted something else than zero
               am != 0)
            {        // extracted the amount successfully
                i.name = entry.substr(0, pos); // put the name part in i.name
                i.amount = am;                 // and the amount part in i.amount
            } else { // extracting the amount resulted in 0
                // set failbit state on is
                is.setstate(std::ios::failbit);
            }
        } else { // no space found, set failbit
            is.setstate(std::ios::failbit);
        }
    }
    return is;
}

// output an ingredient
std::ostream& operator<<(std::ostream& os, const ingredient& i) {
    return os << i.name << " " << i.amount;
}

class recipe {
public:
    std::string const& name() const { return rname; }

    // convenience iterators to iterate over ingreidiences, const
    auto begin() const { return ing.cbegin(); }
    auto end() const { return ing.cend(); }

    // non-const if you'd like to be able to change an ingredient property while iterating
    auto begin() { return ing.begin(); }
    auto end() { return ing.end(); }

private:
    std::list<ingredient> ing{};     // the new type in use
    std::string rname{};             // recipe name

    friend std::istream& operator>>(std::istream&, recipe&);
};

std::istream& operator>>(std::istream& i, recipe& other) {
    std::string line;
    if(std::getline(i, line)) {
        std::istringstream ss(line);
        if(std::getline(ss, other.rname, ';')) {
            // only read the recipe's name here and delegate reading each ingredient
            // to a temporary object of your new ingredient type
            other.ing.clear();             // remove any prior ingrediences from other
            ingredient tmp;
            while(ss >> tmp) {             // extract as normal
                other.ing.push_back(tmp);  // and put in ing if successful
            }
        }
    }
    return i;
}

// output one recipe in the same format as it can be read
std::ostream& operator<<(std::ostream& os, const recipe& other) {
    os << other.name();
    for(auto& i : other) {
        os << ';' << i;
    }
    return os << '\n';
}

int main() {
    std::istringstream is(
        "Salad;Tomatoe 50;Fresh lettuce 100;Potatoe 60;Onion 10\n"
        "Macaroni;Macaroni 250;Tomatoe 60;Oil 10\n"
        "Fish and chips;fish 30;potatoe 30;Oil 40\n");
    recipe r;
    while(is >> r) {
        std::cout << r;
    }
}

Upvotes: 0

Related Questions