syd
syd

Reputation: 355

C++ how to use fstream to read tab-delimited file with spaces

I need to use some C++ code to read a tab-delimited text file. The file contains three columns and the second column contains strings with spaces. Below are some examples of the file.

1   hellow world    uid_1
2   good morning    uid_2

The following is the C++ that I need to use to read the file. However, it can't read the file properly when hitting the space in the string.

Any suggestion on modifying the while loop to make it work? I'm not familiar with C++. Please provide detailed code. Thanks!

#include <Rcpp.h>
#include <iostream>
#include <fstream>
#include <string>

std::ifstream infile (file_name.c_str());

int row = -1; 
std::string col;
std::string uid;


while (infile >> row >> col >> uid) {

    ### operations on row, col and uid ####

}

Upvotes: 6

Views: 23430

Answers (3)

shane abraham
shane abraham

Reputation: 3

You also use vectors and store the contents in the following manner

#include <cstdlib>
#include <iostream>
#include <string>
#include <vector>
#include <sstream>
#include <fstream>
 

 
std::vector<std::string> StringToVector(std::string, 
        char separator);
 
// ----- END OF PROBLEM FUNCTION PROTOTYPE -----
 
int main()
{
    std::ofstream writeToFile;
    std::ifstream readFromFile;
    std::string txtToWrite = "";
    std::string txtFromFile = "";
       
    // Open the file for reading
    readFromFile.open("test.txt", std::ios_base::in);
    
    if(readFromFile.is_open()){
        
        // Read text from file
        while(readFromFile.good()){
            getline(readFromFile, txtFromFile);
        
           
            std::vector<std::string> vect = 
                    StringToVector(txtFromFile, '\t');
            
          for(int i=0;i<vect.size();i++){
              std::cout<<vect[i]<<"\t";
            }
          std::cout<<"\n\n";
        }   
        readFromFile.close();
    }
    
    return 0;
}
 
// ----- PROBLEM FUNCTION -----
 
std::vector<std::string> StringToVector(std::string theString, 
        char separator){
 
    // Create a vector
    std::vector<std::string> vecsWords;
    
    // A stringstream object receives strings separated
    // by a space and then spits them out 1 by 1
    std::stringstream ss(theString);
    
    // Will temporarily hold each word in the string
    std::string sIndivStr;
    
    // While there are more words to extract keep
    // executing
    // getline takes strings from a stream of words stored
    // in the stream and each time it finds a blanks space
    // it stores the word proceeding the space in sIndivStr
    while(getline(ss, sIndivStr, separator)){
        
        // Put the string into a vector
        vecsWords.push_back(sIndivStr);
    }
    
    return vecsWords;
}

Upvotes: 0

Loki Astari
Loki Astari

Reputation: 264361

Its hard to do this directly. This is because you need to use a combination of formatted(operator>>) and non-formatted(std::getline) input routines.

You want to use operator>> to read the id field (and correctly parse an integer); but then you also want to use the function std::getline(), using the third parameter '\t', to read a tab delimited field (Note: The field terminator defaults to '\n' line delimited values).

Normally you don't want to use mix the usage of operator>> and std::getline() together because of how they handle white space.

So the best solution is to write your own input operator and handle that extra space explicitly in a controlled manner.

How to do it:

I would create a class to represent the line.

struct Line
{
    int          id;
    std::string  col;
    std::string  uid;

    void swap(Line& other) noexcept {
        using std::swap;
        swap(id, other.id);
        swap(col, other.col);
        swap(uid, other.uid);
    }
    friend std::istream& operator>>(std::istream& in, Line& data);
};

Then you need to define in an input operator for reading the line.

std::istream& operator>>(std::istream& in, Line& data)
{
    Line   tmp;
    if (// 1 Read the id. Then disicard leading white space before second field.
        (linestream >> tmp.id >> std::ws) && 
        // 2 Read the second field (which is terminated by tab)
        (std::getline(tmp.col, linestream, '\t') &&
        // 3 Read the third field  (which is terminated by newline)
        (std::getline(tmp.uid, linestream)
        // I am being lazy on 3 you may want to be more specific.
       )
    {
        // We have correctly read all the data we need from
        // the line so set the data object from the tmp value.
        data.swap(tmp);
    }
    return in;
}

Now it can be used easily.

Line line;
while (infile >> line) {

    ### operations on row, col and uid ####

}

Upvotes: 2

Marcin
Marcin

Reputation: 238111

One would would be as follows:

#include <iostream>
#include <vector>
#include <fstream>
#include <iterator>
#include <sstream>

using namespace std;

// take from http://stackoverflow.com/a/236803/248823
void split(const std::string &s, char delim, std::vector<std::string> &elems) {
    std::stringstream ss;
    ss.str(s);
    std::string item;
    while (std::getline(ss, item, delim)) {
        elems.push_back(item);
    }
}

int main() {
    std::ifstream infile ("./data.asc");

    std::string line;



    while (std::getline(infile, line))
    {
        vector<string> row_values;

        split(line, '\t', row_values);

        for (auto v: row_values)
            cout << v << ',' ;

        cout << endl;
     }

    cout << "hello " << endl;
    return 0;
}

Results in:

1,hellow world,uid_1,
2,good morning,uid_2,

Note the trailing comma. Not sure what you want to do with the values from the file, so I just made is as simple as possible.

Upvotes: 4

Related Questions