pausag
pausag

Reputation: 146

boost::gregorian input_facet unexpected results

I have a question regarding reading boost::gregorian::date object from a formatted string. When the input string has the format specified, it works as expected. E.g., the code below

std::string fmt = "%Y-%m-%d";
std::string date_str = "2008-10-23";
boost::gregorian::date date;

boost::gregorian::date_input_facet* i_facet(new  boost::gregorian::date_input_facet());
i_facet->format(fmt.c_str());
std::stringstream ss;
ss.exceptions(std::ios_base::failbit);
ss.imbue(std::locale(ss.getloc(), i_facet));
ss << date_str;

ss >> date;

std::cout << date << std::endl;

produces the correct output.

2008-Oct-23

However, if the format does not correspond the input string, the streaming of the string into the date object produces wrong results:

// all the code is the same except input string is as follows: std::string date_str = "20081023";

gives 2008-Feb-01,

So, the question is why it produces wrong results instead of throwing an exception, despite the failbit flag is ON?

I have tried to play a bit with different formats and input strings and seems that every mixture of any types of possible delimiters is fine for it unless there are no delimiters at all as in the example above. Also, neither looking into the boost documentation, nor investigating the code itself brought me to the solution.

*Compiled with g++ (Ubuntu 4.8.2-19ubuntu1) 4.8.2, boost version 1.55

Upvotes: 3

Views: 231

Answers (1)

sehe
sehe

Reputation: 393694

Yes, I agree the behaviour is weird.

What's happening is that the parser never validates the separator characters, at all! Code from boost::date_time::format_date_parser:

enter image description here

Instead the code just skips the input character blindly, assuming it is a separator. This means that in 20081023 1 is parsed for the - in the format specification.

Next up, two digits (02) are taken for the %m specifier (so, Feb).

Finally, the 3 is parsed for the - separator. Apparently, all fields are treated as optional and hence the unspecified day defaults to 1.

A lot of these things strike me as very very sloppy. I'd write my own parsing here, in a jiffy.

DEMO

Live On Coliru

#include <boost/date_time/gregorian/gregorian.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/qi_match.hpp>
#include <iostream>

struct as_yyyy_mm_dd {
    boost::gregorian::date& _into;

    friend std::istream& operator>>(std::istream& is, as_yyyy_mm_dd&& manip) {
        using namespace boost::spirit::qi;

        unsigned short y,m,d;
        if (is >> match(
                    uint_parser<unsigned short, 10, 4, 4>() >> '-' >>
                    uint_parser<unsigned short, 10, 2, 2>() >> '-' >>
                    uint_parser<unsigned short, 10, 2, 2>(), 
                    y, m, d))
        {
            manip._into = { y, m, d };
        }

        return is;
    };
};

int main() {
    boost::gregorian::date date;

    for (auto input : { "20081023", "2008-10-23" })
    {
        std::cout << "Parsing: '" << input << "' ";
        std::stringstream ss(input);
        //ss.exceptions(std::ios_base::failbit);

        if (ss >> as_yyyy_mm_dd{ date })
            std::cout << "Parsed: " << date << std::endl;
        else
            std::cout << "Parse failed\n";
    }
}

Prints:

Parsing: '20081023' Parse failed
Parsing: '2008-10-23' Parsed: 2008-Oct-23

Upvotes: 4

Related Questions