MKS
MKS

Reputation: 73

Get date, month and year value from Date string

I want to get the date, month and year information from the string.

Example Date String: Thu, 30 Jul 2020 00:51:08 -0700 (PDT)

PDT here is for Pacific Daylight time. This string offset (-0700) can change based on system timezone when the file was created.

I need to write a c++ program to extract date, month and year from this string.

Any thoughts on how to go about this?

Upvotes: 1

Views: 3916

Answers (2)

Howard Hinnant
Howard Hinnant

Reputation: 218750

This is a story of evolution. The correct answer greatly depends on your current toolset (how modern it is). And even if it is completely modern, there are still better tools coming.

Homo habilis

In C++98 we could stand upright. And we had tools to scan ints out of arrays of chars. scanf was the tool to do this. This result was not type safe, but we could scan ints and strings and then reinterpret those values as the components of a date: year, month and day. This might look something like this:

#include <cstdio>
#include <cstring>
#include <iostream>

int
main()
{
    using namespace std;

    string s = "Thu, 30 Jul 2020 00:51:08 -0700 (PDT)";
    char const* months[] = {"Jan", "Feb", "Mar", "Apr", "May", "Jun",
                            "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"};
    char wd[4] = {};
    int d;
    char mon[4] = {};
    int y;
    sscanf(s.c_str(), "%s %d %s %d", wd, &d, mon, &y);
    int m;
    for (m = 0; m < 12; ++m)
        if (strcmp(months[m], mon) == 0)
            break;
    ++m;
    cout << y << '\n';
    cout << m << '\n';
    cout << d << '\n';
}

This outputs:

2020
7
30

Notes:

  • The " 00:51:08 -0700 (PDT)" is never even parsed. It could be parsed. But it is a lot more work.
  • There's no error checking. This might be a valid date or might not.
  • There's no type safety. The results are just ints and if you mix them up, it's a run-time error, not a compile-time error.

Neanderthal

Using C++98, there's also a popular but non-standard solution: strptime.

#include <time.h>
#include <iostream>

int
main()
{
    using namespace std;

    string s = "Thu, 30 Jul 2020 00:51:08 -0700 (PDT)";
    tm tm;
    strptime(s.c_str(), "%a, %d %b %Y %T", &tm);
    cout << tm.tm_year + 1900 << '\n';
    cout << tm.tm_mon + 1 << '\n';
    cout << tm.tm_mday << '\n';
    cout << tm.tm_hour << '\n';
    cout << tm.tm_min << '\n';
    cout << tm.tm_sec << '\n';
}

strptime is in the POSIX standard, but not in the C or C++ standards. It is also supported by MS Visual Studio. So it is a popular extension. And with good reason. It is much higher level, and puts the results into a struct tm: A type representing a date/time; the beginnings of type safety.

Output:

2020
7
30
0
51
8

There are still some problems:

  • " -0700 (PDT)" is never parsed. There's no way to ask strptime to do this.
  • There are weird and inconsistent offsets on the different fields of tm. For example the month is zero-based and the day is one-based. But at least it knows how to parse the time too, and relatively easily.
  • Error checking is there but easy to ignore. strptime returns NULL if something bad happens.

Cro-Magnon

With C++11 arrived an actual C++ wrapper around strptime that was officially recognized by the C++ standard with std::get_time:

#include <iomanip>
#include <iostream>
#include <sstream>

int
main()
{
    using namespace std;

    string s = "Thu, 30 Jul 2020 00:51:08 -0700 (PDT)";
    istringstream in{s};
    in.exceptions(ios::failbit);
    tm tm;
    in >> get_time(&tm, "%a, %d %b %Y %T");
    cout << tm.tm_year + 1900 << '\n';
    cout << tm.tm_mon + 1 << '\n';
    cout << tm.tm_mday << '\n';
    cout << tm.tm_hour << '\n';
    cout << tm.tm_min << '\n';
    cout << tm.tm_sec << '\n';
}

With a C++ wrapper you can parse from streams, which gives you access to throwing an exception on parse failure. But it is still a simple wrapper and so the result is just a tm. This has the same weirdness as the previous solution.

The output is the same as in the previous solution:

2020
7
30
0
51
8

Homo sapiens

Though the strongly typed std::chrono time_point / duration system was introduced in C++11, it is not until C++20 that it is integrated with the civil calendar, gaining get_time-like functionality, and going far beyond that.

#include <chrono>
#include <iostream>
#include <sstream>

int
main()
{
    using namespace std;
    using namespace std::chrono;

    string s = "Thu, 30 Jul 2020 00:51:08 -0700 (PDT)";
    istringstream in{s};
    in.exceptions(ios::failbit);
    local_seconds t;
    in >> parse("%a, %d %b %Y %T %z (%Z)", t);
    auto td = floor<days>(t);
    year_month_day ymd{td};
    hh_mm_ss hms{t-td};
    cout << ymd << ' ' << hms << '\n';
    cout << ymd.year() << '\n';
    cout << ymd.month() << '\n';
    cout << ymd.day() << '\n';
    cout << hms.hours() << '\n';
    cout << hms.minutes() << '\n';
    cout << hms.seconds() << '\n';
}

Output:

2020-07-30 00:51:08
2020
Jul
30
0h
51min
8s

The first thing to notice is the much stronger type-safety. No longer is there a need to convert everything to ints to print it out. And no longer is it necessary to convert to ints to do other operations such as arithmetic and comparison.

For example ymd.year() has type std::chrono::year, not int. If necessary, one can explicitly convert between these two representations. But it is generally unnecessary, and akin to a risky reinterpret_cast.

There are no longer unintuitive biases such as 1900, or zero-based counts in unexpected places.

Output generally includes the units for easier debugging.

The " -0700 (PDT)" is parsed here! These values are not used in the results, but they must be there, else there is a parse error. And if you want to get these values, they are available with very simple changes:

string abbrev;
minutes offset;
in >> parse("%a, %d %b %Y %T %z (%Z)", t, abbrev, offset);
...
cout << offset << '\n';
cout << abbrev << '\n';

Now the output includes:

-420min
PDT

If you need the fields in UTC, instead of in local time, that is one simple change:

sys_seconds t;

instead of:

local_seconds t;

Now the offset is subtracted from the parsed time point to result in a UTC time_point (a std::chrono::time_point based on system_clock) instead and the output changes to:

2020-07-30 07:51:08
2020
Jul
30
7h
51min
8s

This allows you to easily parse local times plus offset directly into system_clock::time_point.

Though not shipping yet (as I write this), vendors are working on implementing this. And in the meantime you can get this functionality with a free, open-source, header-only C++20 <chrono> preview library that works with C++11/14/17. Just add #include "date/date.h" and using namespace date; and everything just works. Though with C++11/14 you will need to substitute hh_mm_ss<seconds> hms{t-td}; for hh_mm_ss hms{t-td}; (lack of CTAD).

Upvotes: 2

Alonso Mondal
Alonso Mondal

Reputation: 64

#include <time.h>
char *strptime(const char *buf, const char *format, struct tm *tm);

Upvotes: 0

Related Questions