Reputation: 73
I want to get the date, month and year information from the string.
Example Date String: Thu, 30 Jul 2020 00:51:08 -0700 (PDT)
PDT here is for Pacific Daylight time. This string offset (-0700) can change based on system timezone when the file was created.
I need to write a c++ program to extract date, month and year from this string.
Any thoughts on how to go about this?
Upvotes: 1
Views: 3916
Reputation: 218750
This is a story of evolution. The correct answer greatly depends on your current toolset (how modern it is). And even if it is completely modern, there are still better tools coming.
In C++98 we could stand upright. And we had tools to scan int
s out of arrays of char
s. scanf
was the tool to do this. This result was not type safe, but we could scan ints and strings and then reinterpret those values as the components of a date: year, month and day. This might look something like this:
#include <cstdio>
#include <cstring>
#include <iostream>
int
main()
{
using namespace std;
string s = "Thu, 30 Jul 2020 00:51:08 -0700 (PDT)";
char const* months[] = {"Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"};
char wd[4] = {};
int d;
char mon[4] = {};
int y;
sscanf(s.c_str(), "%s %d %s %d", wd, &d, mon, &y);
int m;
for (m = 0; m < 12; ++m)
if (strcmp(months[m], mon) == 0)
break;
++m;
cout << y << '\n';
cout << m << '\n';
cout << d << '\n';
}
This outputs:
2020
7
30
Notes:
" 00:51:08 -0700 (PDT)"
is never even parsed. It could be parsed. But it is a lot more work.int
s and if you mix them up, it's a run-time error, not a compile-time error.Using C++98, there's also a popular but non-standard solution: strptime
.
#include <time.h>
#include <iostream>
int
main()
{
using namespace std;
string s = "Thu, 30 Jul 2020 00:51:08 -0700 (PDT)";
tm tm;
strptime(s.c_str(), "%a, %d %b %Y %T", &tm);
cout << tm.tm_year + 1900 << '\n';
cout << tm.tm_mon + 1 << '\n';
cout << tm.tm_mday << '\n';
cout << tm.tm_hour << '\n';
cout << tm.tm_min << '\n';
cout << tm.tm_sec << '\n';
}
strptime
is in the POSIX standard, but not in the C or C++ standards. It is also supported by MS Visual Studio. So it is a popular extension. And with good reason. It is much higher level, and puts the results into a struct tm
: A type representing a date/time; the beginnings of type safety.
Output:
2020
7
30
0
51
8
There are still some problems:
" -0700 (PDT)"
is never parsed. There's no way to ask strptime
to do this.tm
. For example the month is zero-based and the day is one-based. But at least it knows how to parse the time too, and relatively easily.strptime
returns NULL
if something bad happens.With C++11 arrived an actual C++ wrapper around strptime
that was officially recognized by the C++ standard with std::get_time
:
#include <iomanip>
#include <iostream>
#include <sstream>
int
main()
{
using namespace std;
string s = "Thu, 30 Jul 2020 00:51:08 -0700 (PDT)";
istringstream in{s};
in.exceptions(ios::failbit);
tm tm;
in >> get_time(&tm, "%a, %d %b %Y %T");
cout << tm.tm_year + 1900 << '\n';
cout << tm.tm_mon + 1 << '\n';
cout << tm.tm_mday << '\n';
cout << tm.tm_hour << '\n';
cout << tm.tm_min << '\n';
cout << tm.tm_sec << '\n';
}
With a C++ wrapper you can parse from streams, which gives you access to throwing an exception on parse failure. But it is still a simple wrapper and so the result is just a tm
. This has the same weirdness as the previous solution.
The output is the same as in the previous solution:
2020
7
30
0
51
8
Though the strongly typed std::chrono
time_point
/ duration
system was introduced in C++11, it is not until C++20 that it is integrated with the civil calendar, gaining get_time
-like functionality, and going far beyond that.
#include <chrono>
#include <iostream>
#include <sstream>
int
main()
{
using namespace std;
using namespace std::chrono;
string s = "Thu, 30 Jul 2020 00:51:08 -0700 (PDT)";
istringstream in{s};
in.exceptions(ios::failbit);
local_seconds t;
in >> parse("%a, %d %b %Y %T %z (%Z)", t);
auto td = floor<days>(t);
year_month_day ymd{td};
hh_mm_ss hms{t-td};
cout << ymd << ' ' << hms << '\n';
cout << ymd.year() << '\n';
cout << ymd.month() << '\n';
cout << ymd.day() << '\n';
cout << hms.hours() << '\n';
cout << hms.minutes() << '\n';
cout << hms.seconds() << '\n';
}
Output:
2020-07-30 00:51:08
2020
Jul
30
0h
51min
8s
The first thing to notice is the much stronger type-safety. No longer is there a need to convert everything to int
s to print it out. And no longer is it necessary to convert to int
s to do other operations such as arithmetic and comparison.
For example ymd.year()
has type std::chrono::year
, not int
. If necessary, one can explicitly convert between these two representations. But it is generally unnecessary, and akin to a risky reinterpret_cast
.
There are no longer unintuitive biases such as 1900, or zero-based counts in unexpected places.
Output generally includes the units for easier debugging.
The " -0700 (PDT)"
is parsed here! These values are not used in the results, but they must be there, else there is a parse error. And if you want to get these values, they are available with very simple changes:
string abbrev;
minutes offset;
in >> parse("%a, %d %b %Y %T %z (%Z)", t, abbrev, offset);
...
cout << offset << '\n';
cout << abbrev << '\n';
Now the output includes:
-420min
PDT
If you need the fields in UTC, instead of in local time, that is one simple change:
sys_seconds t;
instead of:
local_seconds t;
Now the offset is subtracted from the parsed time point to result in a UTC time_point (a std::chrono::time_point
based on system_clock
) instead and the output changes to:
2020-07-30 07:51:08
2020
Jul
30
7h
51min
8s
This allows you to easily parse local times plus offset directly into system_clock::time_point
.
Though not shipping yet (as I write this), vendors are working on implementing this. And in the meantime you can get this functionality with a free, open-source, header-only C++20 <chrono>
preview library that works with C++11/14/17. Just add #include "date/date.h"
and using namespace date;
and everything just works. Though with C++11/14 you will need to substitute hh_mm_ss<seconds> hms{t-td};
for hh_mm_ss hms{t-td};
(lack of CTAD).
Upvotes: 2
Reputation: 64
#include <time.h>
char *strptime(const char *buf, const char *format, struct tm *tm);
Upvotes: 0