Reputation: 193
I have a text file with state names and their respective abbreviations. It looks something like this:
Florida FL
Nevada NV
New York NY
So the number of whitespaces between state name and abbreviation differs. I want to extract the name and abbreviation and I thought about using getline with whitespace as a delimiter but I have problems with the whitespace in names like "New York". What function could I use instead?
Upvotes: 0
Views: 156
Reputation: 15277
The systematic way is to analyze the all possible input data and then search for a pattern in the text. In your case, we analyze the problem and find out that
So, if we search for the state abbreviation pattern and split that of, then the full name of the state will be available. But maybe with trailing and leading spaces. This we will remove and then the result is there.
For searching we will use a std::regex
. The pattern is: 1 or more uppercase letters followed by 0 or more white spaces, followed by the end of the line. The regular expressions for that is: "([A-Z]+)\\s*$"
When this is available, the prefix of the result contains the full statename. We will remove leading and trailing spaces and that's it.
Please see:
#include <iostream>
#include <string>
#include <sstream>
#include <regex>
std::istringstream textFile(R"( Florida FL
Nevada NV
New York NY)");
std::regex regexStateAbbreviation("([A-Z]+)\\s*$");
int main()
{
// Split of some parts
std::smatch stateAbbreviationMatch{};
std::string line{};
while (std::getline(textFile, line)) {
if (std::regex_search(line, stateAbbreviationMatch, regexStateAbbreviation))
{
// Get the state
std::string state(stateAbbreviationMatch.prefix());
// Remove leading and trailing spaces
state = std::regex_replace(state, std::regex("^ +| +$|( ) +"), "$1");
// Get the state abbreviation
std::string stateabbreviation(stateAbbreviationMatch[0]);
// Print Result
std::cout << stateabbreviation << ' ' << state << '\n';
}
}
return 0;
}
Upvotes: 0
Reputation: 409156
You know that the abbreviation is always two characters.
So you can read the whole line, and split it at two characters from the end (probably using substr
).
Then trim the first string and you have two nice strings for the name and abbreviation.
Upvotes: 1