Reputation: 3234
I have a function that will read a CSV file line by line. For each line, it will split the line into a vector. The code to do this is
std::stringstream ss(sText);
std::string item;
while(std::getline(ss, item, ','))
{
m_vecFields.push_back(item);
}
This works fine except for if it reads a line where the last value is blank. For example,
text1,tex2,
I would want this to return a vector of size 3 where the third value is just empty. However, instead it just returns a vector of size 2. How can I correct this?
Upvotes: 5
Views: 15281
Reputation: 38949
C++11 makes it exceedingly easy to handle even escaped commas using regex_token_iterator:
std::stringstream ss(sText);
std::string item;
const regex re{"((?:[^\\\\,]|\\\\.)*?)(?:,|$)"};
std::getline(ss, item)
m_vecFields.insert(m_vecFields.end(), sregex_token_iterator(item.begin(), item.end(), re, 1), sregex_token_iterator());
Incidentally if you simply wanted to construct a vector<string>
from a CSV string
such as item
you could just do:
const regex re{"((?:[^\\\\,]|\\\\.)*?)(?:,|$)"};
vector<string> m_vecFields{sregex_token_iterator(item.begin(), item.end(), re, 1), sregex_token_iterator()};
Some quick explanation of the regex
is probably in order. (?:[^\\\\,]|\\\\.)
matches escaped characters or non-','
characters. (See here for more info: https://stackoverflow.com/a/7902016/2642059) The *?
means that it is not a greedy match, so it will stop at the first ','
reached. All that's nested in a capture, which is selected by the last parameter, the 1
, to regex_token_iterator
. Finally, (?:,|$)
will match either the ','
-delimiter or the end of the string
.
const regex re{"((?:[^\\\\,]|\\\\.)+?)(?:,|$)"};
Notice the '+'
has now replaced the '*'
indicating 1 or more matching characters are required. This will prevent it from matching your item
string that ends with a ','
. You can see an example of this here: http://ideone.com/W4n44W
Upvotes: 2
Reputation: 243
Flexible solution for parsing csv files: where:
source - content of CSV file
delimeter - CSV delimeter eg. ',' ';'
std::vector<std::string> csv_split(std::string source, char delimeter) {
std::vector<std::string> ret;
std::string word = "";
int start = 0;
bool inQuote = false;
for(int i=0; i<source.size(); ++i){
if(inQuote == false && source[i] == '"'){
inQuote = true;
continue;
}
if(inQuote == true && source[i] == '"'){
if(source.size() > i && source[i+1] == '"'){
++i;
} else {
inQuote = false;
continue;
}
}
if(inQuote == false && source[i] == delimeter){
ret.push_back(word);
word = "";
} else {
word += source[i];
}
}
ret.push_back(word);
return ret;
}
Upvotes: 2
Reputation: 10350
You could just use boost::split
to do all this for you.
http://www.boost.org/doc/libs/1_50_0/doc/html/string_algo/usage.html#id3207193
It has the behaviour that you require in one line.
#include <iostream>
#include <vector>
#include <boost/algorithm/string.hpp>
using namespace std;
int main()
{
vector<string> strs;
boost::split(strs, "please split,this,csv,,line,", boost::is_any_of(","));
for ( vector<string>::iterator it = strs.begin(); it < strs.end(); it++ )
cout << "\"" << *it << "\"" << endl;
return 0;
}
"please split"
"this"
"csv"
""
"line"
""
Upvotes: 4
Reputation: 3092
You can use a function similar to this:
template <class InIt, class OutIt>
void Split(InIt begin, InIt end, OutIt splits)
{
InIt current = begin;
while (begin != end)
{
if (*begin == ',')
{
*splits++ = std::string(current,begin);
current = ++begin;
}
else
++begin;
}
*splits++ = std::string(current,begin);
}
It will iterate through the string and whenever it encounters the delimiter, it will extract the string and store it in the splits iterator.
The interesting part is
You can use it like this:
std::stringstream ss(sText);
std::string item;
std::vector<std::string> m_vecFields;
while(std::getline(ss, item))
{
Split(item.begin(), item.end(), std::back_inserter(m_vecFields));
}
std::for_each(m_vecFields.begin(), m_vecFields.end(), [](std::string& value)
{
std::cout << value << std::endl;
});
Upvotes: 2
Reputation: 55425
bool addEmptyLine = sText.back() == ',';
/* your code here */
if (addEmptyLine) m_vecFields.push_back("");
or
sText += ','; // text1, text2,,
/* your code */
assert(m_vecFields.size() == 3);
Upvotes: 2