jtotheakob
jtotheakob

Reputation: 33

How to loop through vectors for specific strings

I am struggling to declare a loop that takes a field of a vector, check whether it appears for the first time or jump to the next vector until this field contains a new string.

My input file (.csvx) looks something like:

No.; ID; A; B; C;...;Z;
1;1_380; Value; Value; Value;...; Value;
2;1_380; Value; Value; Value;...; Value;
3;1_380; Value; Value; Value;...; Value;
...
41;2_380; Value; Value; Value;...; Value;
42;2_380; Value; Value; Value;...; Value;
...
400000; 6_392; Value; Value; Value;...; Value; 

Note:File is relatively large....

I managed to parse my file into a vector<vector<string> > and split lines at semicolons to access any field. Now I would like to access the first "ID", i.e. 1_380 and store parameters from same line, then go to the next ID 2_380 and store again those parameters and so on...

This is my code so far:

#include <cstdlib>
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
#include <algorithm>
#include <boost/algorithm/string.hpp>

using namespace std;

/*
 * CSVX Reader defined to fetch data from 
 * CSVX file into vectors
 */
class CSVXReader
{
   string fileName, delimiter;
public:
   CSVXReader(string filename, string delm = ";") :
   fileName(filename), delimiter(delm)
   {}
   vector<vector<string> > getData();           //Function to fetch data 
   };                                           //from CSVX file 

/*
 * Parse through CSVX file line by line 
 * and return the data in vector of vector
 * of strings
 */
vector<vector<string> > CSVXReader::getData()
{
   ifstream file(fileName);
   vector<vector<string> > dataList;               //Vector of vector 
                                                   //contains all data

   string line = "";                              
   while (getline(file, line))                  //Iterate through each line 
                                                //and split the content 
                                                //using delimiter
   {
      vector<string> vec;                       //Vector contains a row from 
                                                //input file 
      boost::algorithm::split(vec, line, boost::is_any_of(delimiter));
      dataList.push_back(vec);
   }
file.close();
return dataList;
}


int main(int argc, char** argv) 
{
   CSVXReader reader("file.csvx");                     //Creating an object 
                                                       //of CSVXReader
   vector<vector<string> > dataList = reader.getData();//Get the data from 
                                                       //CSVX file
   for(vector<string> vec : datalist)                  //Loop to go through 
                                                       //each line of 
                                                       //dataList 
                                                       //(vec1,vec2;vec3...)
   if(vec[1] contains "_" && "appears for the first time")
   {store parameters...};
   else{go to next line};
return 0;
}

As you can see, I have no clue how to declare my loop properly... To be clear, I want to check the second field of each vector "vec": is it new? -> Store data of same line, if not -> jump to next line, i.e. vector until a new ID appears.

Looking forward for any advice!

Upvotes: 1

Views: 564

Answers (3)

jtotheakob
jtotheakob

Reputation: 33

Ok fellas, I was playing around with my code and realized that @Armins second solution (modified while loop) doesn't consider unordered lists, i.e. if an element shows up again much later, it is compared with previous element (oldValue) and inserted, although it exists already in my container...

After some reading (and more has to come obviously), I tend to @Paul's unordered_set. My first question arises right here: why didn't you suggest set instead? From what I found, unordered_set is apparently faster for search operations. In my personal very limited mind this is difficult to understand... but I don't want to dig too deep here. Is this your reason? Or are there other advantages that I missed?

Despite your suggestion, I tried to use set, which seems in my situation a better, because more ordered way. And again my code resists to run:

set<vector<string> > CSVReader::getData() {

ifstream file(fileName);

set<vector<string> > container;

string line = "";
string uniqueValue{};

while (getline(file, line))                          //Iterate through each line and split the content using delimiter
{
    //Vector contains a row from RAO file
    vector<string> vec;                        
    boost::algorithm::split(vec, line, boost::is_any_of(delimiter));

    uniqueValue = vec[2];

    //Line (or vector) is added to container if the uniqueValue, e.g. 1_380, appears for the first time                   

    if(!container.count(uniqueValue))
    {
        container.insert(vec);
    }

}

file.close();
return container;  
}

The error says:

error: no matching function for call to 'std::set<std::vector<std::__cxx11::basic_string<char> > >::count(std::__cxx11::string&)'
     if(!localDetails.count(localDetail))

Since I followed your example, what did I do wrong?

PS: Just reading about SO policies... hope this additional question is acceptable though

Upvotes: 0

A M
A M

Reputation: 15277

Basically you do not need all the code that the other answers provide. You need just one statement to copy the data to where you want to have them.

Let us assume that you have read your data already in your dataList. And you defined a new std::vector<std::vector<std::string>> parameter{}; where you want to store the unique result.

The algorithm libraray has a function called std:copy_if. This will copy data only, if a predicate (a condition) is true. Your condition is that a line is different from a previous line. Then it is a new line with new data and you will copy it. If a line is equal to its previous line data, then do not copy it.

So, we will remember the important data from the last line. And then compare in the next line the data with the stored value. If it is different, store the parameter. If not, then not. After each check, we assign the current value to the last value. As initial "last Value" we will use an empty string. So the first line will always be different. The statement will then look like this:

std::copy_if(dataList.begin(), dataList.end(), std::back_inserter(parameter),
    [lastID = std::string{}](const std::vector<std::string> & sv) mutable {
        bool result = (lastID != sv[1]);
        lastID = sv[1];
        return result;
    }
);

So we copy all data from the begin to the end of the dataList to the parameter vector, if and only if, the second string in the source vector (index=1) is different than our old remembered value.

Rather straightforward.

An additional optimization would be, to immediately sort out the correct parameters and not store the complete vector with all data in the first place, but to store only necessary data. This will reduce the necessary memory drastically.

Modify your while loop to:

string line = "";                              
string oldValue{};
while (getline(file, line))                 //Iterate through each line 
                                            //and split the content 
                                            //using delimiter
{
    vector<string> vec;                       //Vector contains a row from 
                                                //input file 
    boost::algorithm::split(vec, line, boost::is_any_of(delimiter));

    if (oldValue != vec[1]) {
        dataList.push_back(vec);
    }
    oldValue = vec[1];
}

With that you get it right from the beginning.

An additional solution is like below

#include <vector>
#include <iostream>
#include <string>
#include <iterator>
#include <regex>
#include <fstream>
#include <sstream>
#include <algorithm>

std::istringstream testFile{R"(1;1_380; Value1; Value2; Value3; Value4
2;1_380; Value5; Value6; Value7; Value8
3;1_380; Value9 Value10 
41;2_380; Value11; Value12; Value13
42;2_380; Value15
42;2_380; Value16
500;3_380; Value99
400000; 6_392; Value17; Value18; Value19; Value20
400001; 6_392; Value21; Value22; Value23; Value24)"
};


class LineAsVector {    // Proxy for the input Iterator
public:
    // Overload extractor. Read a complete line
    friend std::istream& operator>>(std::istream& is, LineAsVector& lv) {

        // Read a line
        std::string line; lv.completeLine.clear();
        std::getline(is, line); 

        // The delimiter
        const std::regex re(";");

        // Split values and copy into resulting vector
        std::copy(  std::sregex_token_iterator(line.begin(), line.end(), re, -1),
                    std::sregex_token_iterator(),
                    std::back_inserter(lv.completeLine));
        return is; 
    }

    // Cast the type 'CompleteLine' to std::string
    operator std::vector<std::string>() const { return completeLine; }
protected:
    // Temporary to hold the read vector
    std::vector<std::string> completeLine{};
};

int main()
{

    // This is the resulting vector which will contain the result
    std::vector<std::vector<std::string>> parameter{};


    // One copy statement to copy all necessary data from the file to the parameter list
    std::copy_if (
        std::istream_iterator<LineAsVector>(testFile),
        std::istream_iterator<LineAsVector>(),
        std::back_inserter(parameter),
        [lastID = std::string{}](const std::vector<std::string> & sv) mutable {
            bool result = (lastID != sv[1]);
            lastID = sv[1];
            return result;
        }
    );


    // For debug purposes: Show result on screen
    std::for_each(parameter.begin(), parameter.end(), [](std::vector<std::string> & sv) {
        std::copy(sv.begin(), sv.end(), std::ostream_iterator<std::string>(std::cout, " "));
        std::cout << '\n';
        } 
    );
    return 0;
}

Please note: In function main, we do everything in one statement: std::copy_if. The source is in this case an std::istream so an std::ifstream (a file) or wahtever you want. In SO I use an std::istringstream because I cannot use files here. But it is the same. Just replace the variable in the std::istream_iterator. We iterate over the file with the std::istream_iterator.

What a pitty that nobody will read this . . .

Upvotes: 1

PaulMcKenzie
PaulMcKenzie

Reputation: 35440

Since you wrote pseudo-code, it is difficult to write real code.

But in general, if you want to detect if an item has occurred already, you can utilize a std::unordered_set to implement the "appears for the first time".

Using your pseudo-code:

#include <unordered_set>
//...
std::unordered_set<std::string> stringSet;
//...
for(vector<string>& vec : datalist)
{
    if(vec[1] contains "_" && !stringSet.count(vec[1]))
    {
         //...
         stringSet.insert(vec[1]);
    }
}

The condition checks if the item is in the unordered_set. If it is, then we skip, if not, then we process the item and add it to the unordered_set.

Upvotes: 3

Related Questions