Reputation: 193

Parse string with delimiter whitespace but having strings include whitespace as well?

I have a text file with state names and their respective abbreviations. It looks something like this:

Florida FL
Nevada      NV
New York     NY

So the number of whitespaces between state name and abbreviation differs. I want to extract the name and abbreviation and I thought about using getline with whitespace as a delimiter but I have problems with the whitespace in names like "New York". What function could I use instead?

Upvotes: 0

Answers (2)

A M

Reputation: 15277

The systematic way is to analyze the all possible input data and then search for a pattern in the text. In your case, we analyze the problem and find out that

at the end of the string we have some consecutive uppercase letters
before that we have the state's name

So, if we search for the state abbreviation pattern and split that of, then the full name of the state will be available. But maybe with trailing and leading spaces. This we will remove and then the result is there.

For searching we will use a std::regex. The pattern is: 1 or more uppercase letters followed by 0 or more white spaces, followed by the end of the line. The regular expressions for that is: "([A-Z]+)\\s*$"

When this is available, the prefix of the result contains the full statename. We will remove leading and trailing spaces and that's it.

Please see:

#include <iostream>
#include <string>
#include <sstream>
#include <regex>

std::istringstream textFile(R"(   Florida FL
  Nevada      NV
New York     NY)");

std::regex regexStateAbbreviation("([A-Z]+)\\s*$");

int main()
{
    // Split of some parts
    std::smatch stateAbbreviationMatch{};
    std::string line{};

    while (std::getline(textFile, line)) {
        if (std::regex_search(line, stateAbbreviationMatch, regexStateAbbreviation))
        {
            // Get the state
            std::string state(stateAbbreviationMatch.prefix());
            // Remove leading and trailing spaces
            state = std::regex_replace(state, std::regex("^ +| +$|( ) +"), "$1");

            // Get the state abbreviation
            std::string stateabbreviation(stateAbbreviationMatch[0]);

            // Print Result
            std::cout << stateabbreviation << ' ' << state << '\n';
        }
    }
    return 0;
}

Upvotes: 0

Some programmer dude

Reputation: 409156

You know that the abbreviation is always two characters.

So you can read the whole line, and split it at two characters from the end (probably using substr).

Then trim the first string and you have two nice strings for the name and abbreviation.

Upvotes: 1

Parse string with delimiter whitespace but having strings include whitespace as well?

Answers (2)

Related Questions