Kiran
Kiran

Reputation: 5526

break long string into multiple c++

I have a string that is received from third party. This string is actually the text from a text file and it may contain UNIX LF or Windows CRLF for line termination. How can I break this into multiple strings ignoring blank lines? I was planning to do the following, but am not sure if there is a better way. All I need to do is read line by line. Vector here is just a convenience and I can avoid it. * Unfortunately I donot have access to the actual file. I only receive the string object *

string textLine;
vector<string> tokens;

size_t pos = 0;
while( true ) {
    size_t nextPos = textLine.find( pos, '\n\r' );
    if( nextPos == textLine.npos )
        break;
    tokens.push_back( string( textLine.substr( pos, nextPos - pos ) ) );
    pos = nextPos + 1;
}

Upvotes: 6

Views: 1867

Answers (6)

James Kanze
James Kanze

Reputation: 153977

A lot depends on what is already present in your toolkit. I work a lot with files which come from Windows and are read under Unix, and vice versa, so I have most of the tools for converting CRLF into LF at hand. If you don't have any, you might want a function along the lines of:

void addLine( std::vector<std::string>& dest, std::string line )
{
    if ( !line.empty() && *(line.end() - 1) == '\r' ) {
        line.erase( line.end() - 1 );
    }
    if ( !line.empty() ) {
        dest.push_back( line );
    }
}

to do your insertions. As for breaking the original text into lines, you can use std::istringstream and std::getline, as others have suggested; it's simple and straightforward, even if it is overkill. (The std::istringstream is a fairly heavy mechanism, since it supports all sorts of input conversions you don't need.) Alternatively, you might consider a loop along the lines of:

std::string::const_iterator start = textLine.begin();
std::string::const_iterator end   = textLine.end();
std::string::const_iterator next  = std::find( start, end, '\n' );
while ( next != end ) {
    addLine( tokens, std::string( start, next ) );
    start = next + 1;
    next = std::find( start, end, '\n' );
}
addLine( tokens, std::string( start, end ) );

Or you could break things down into separate operations:

textLine.erase(
    std::remove( textLine.begin(), textLine.end(), '\r'),
    textLine.end() );

to get rid of all of the CR's,

std::vector<std:;string> tokens( split( textLine, '\n' ) );

, to break it up into lines, where split is a generalized function along the lines of the above loop (a useful tool to add to your toolkit), and finally:

tokens.erase(
    std::remove_if( tokens.begin(), tokens.end(), 
                    boost::bind( &std::string::empty, _1 ) ),
    tokens.end() );

. (Generally speaking: if this is a one-of situation, use the std::istringstream based solution. If you think you may have to do something like this from time to time in the future, add the split function to your took kit, and use it.)

Upvotes: 1

Robᵩ
Robᵩ

Reputation: 168716

I'd use getline to create new strings based on \n, and then manipulate the line endings.

string textLine;
vector<string> tokens;

istringstream sTextLine;
string line;
while(getline(sTextLine, line)) {
  if(line.empty()) continue;
  if(line[line.size()-1] == '\r') line.resize(line.size()-1);
  if(line.empty()) continue;
  tokens.push_back(line);
}

EDIT: Use istringstream instead of stringstream.

Upvotes: 4

Jonathan Geisler
Jonathan Geisler

Reputation: 472

I would put the string in a stringstream and then use the getline method like the previous answer mentioned. Then, you could just act like you were reading the text in from a file when it really comes from another string.

Upvotes: 0

Martin Stone
Martin Stone

Reputation: 13007

I would use the approach given here (std::getline on a std::istringstream)...

Splitting a C++ std::string using tokens, e.g. ";"

... except omit the ';' parameter to std::getline.

Upvotes: 2

helpermethod
helpermethod

Reputation: 62244

You could use strtok.

Split string into tokens

A sequence of calls to this function split str into tokens, which are sequences of contiguous characters separated by any of the characters that are part of delimiters.

Upvotes: 0

Michael Kristofik
Michael Kristofik

Reputation: 35188

You could use std::getline as you're reading from the file instead of reading the whole thing into a string. That will break things up line by line by default. You can simply not push_back any string that comes up empty.

string line;
vector<string> tokens;

while (getline(file, line))
{
    if (!line.empty()) tokens.push_back(line);
}

UPDATE:

If you don't have access to the file, you can use the same code by initializing a stringstream with the whole text. std::getline works on all stream types, not just files.

Upvotes: 6

Related Questions