CuppM
CuppM

Reputation: 1698

Parsing of Ordered Timestamps in Local Time (to UTC) While Observing Daylight Saving Time

I have CSV data files with timestamped records that are in local time. Unfortunately the data files cover the period where daylight saving time changes (Nov 3rd 2013) so the time component of the timestamps for the records go: 12:45, 1:00, 1:15, 1:30, 1:45, 1:00, 1:15, 1:30, 1:45, 2:00. I want to be able to convert and store the values in the database as UTC.

Unfortunately the standard DateTime.Parse() function of .NET will parse as this (all November 3rd 2013):

| Time String | Parsed Local Time | In DST | Parsed Local Time to UTC
|  12:45 am   |    12:45 am       |   Yes  |     4:45 am
| 12:59:59 am |    12:59:59 am    |   Yes  |     4:59:59 am
|  01:00 am   |     1:00 am       |   No   |     6:00 am
|  01:15 am   |     1:15 am       |   No   |     6:15 am

So it never sees the 1:00-1:59:59 am range as being in DST and my parsed timestamps in UTC jumps an hour.

Is there a library or class out there that will allow me to parse the timestamps and take into account the change in DST? Like some instantiatable class that will remember the stream of timestamps it's already received and adjust the parsed timestamp accordingly?

Assumptions about the data that can be made while parsing:

  1. I have the start time of file (timestamp of the first record) in the header section of the file in both local and UTC.
  2. The records are in order by timestamp
  3. All local times are in Eastern Standard
  4. The data could also go the other way: from out of DST into it
  5. The records contain a full timestamp in the format: yyyy/mm/dd HH:mm:ss (2013/11/03 00:45:00)

Note: While my software is in C#, I did not tag C#/.NET specifically as I figured I could use any language's implementation of a solution and recode if necessary.

Upvotes: 3

Views: 1017

Answers (2)

jfs
jfs

Reputation: 414825

If successive timestamps can't go backwards if expressed as time in UTC then this Python script can convert the local time into UTC:

#!/usr/bin/env python3
import sys
from datetime import datetime, timedelta
import pytz  # $ pip install pytz

tz = pytz.timezone('America/New_York' if len(sys.argv) < 2 else sys.argv[1])
previous = None #XXX set it from UTC time: `first_entry_utc.astimezone(tz)`
for line in sys.stdin: # read from stdin
    naive = datetime.strptime(line.strip(), "%Y/%m/%d %H:%M:%S") # no timezone
    try:
        local = tz.localize(naive, is_dst=None) # attach timezone info
    except pytz.AmbiguousTimeError:
        # assume ambiguous time always corresponds to True -> False transition
        local = tz.localize(naive, is_dst=True)
        if previous >= local: # timestamps must be increasing
            local = tz.localize(naive, is_dst=False)
        assert previous < local
    #NOTE: allow NonExistentTimeError to propagate (there shouldn't be
    # invalid local times in the input)
    previous = local
    utc = local.astimezone(pytz.utc)
    timestamp = utc.timestamp()
    time_format = "%Y-%m-%d %H:%M:%S %Z%z"
    print("{local:{time_format}}; {utc:{time_format}}; {timestamp:.0f}"
          .format_map(vars()))

Input

2013/11/03 00:45:00
2013/11/03 01:00:00
2013/11/03 01:15:00
2013/11/03 01:30:00
2013/11/03 01:45:00
2013/11/03 01:00:00
2013/11/03 01:15:00
2013/11/03 01:30:00
2013/11/03 01:45:00
2013/11/03 02:00:00

Output

2013-11-03 00:45:00 EDT-0400; 2013-11-03 04:45:00 UTC+0000; 1383453900
2013-11-03 01:00:00 EDT-0400; 2013-11-03 05:00:00 UTC+0000; 1383454800
2013-11-03 01:15:00 EDT-0400; 2013-11-03 05:15:00 UTC+0000; 1383455700
2013-11-03 01:30:00 EDT-0400; 2013-11-03 05:30:00 UTC+0000; 1383456600
2013-11-03 01:45:00 EDT-0400; 2013-11-03 05:45:00 UTC+0000; 1383457500
2013-11-03 01:00:00 EST-0500; 2013-11-03 06:00:00 UTC+0000; 1383458400
2013-11-03 01:15:00 EST-0500; 2013-11-03 06:15:00 UTC+0000; 1383459300
2013-11-03 01:30:00 EST-0500; 2013-11-03 06:30:00 UTC+0000; 1383460200
2013-11-03 01:45:00 EST-0500; 2013-11-03 06:45:00 UTC+0000; 1383461100
2013-11-03 02:00:00 EST-0500; 2013-11-03 07:00:00 UTC+0000; 1383462000

Upvotes: 2

Matt Johnson-Pint
Matt Johnson-Pint

Reputation: 241888

In C#:

// Define the input values.
string[] input =
{
    "2013-11-03 00:45:00",
    "2013-11-03 01:00:00",
    "2013-11-03 01:15:00",
    "2013-11-03 01:30:00",
    "2013-11-03 01:45:00",
    "2013-11-03 01:00:00",
    "2013-11-03 01:15:00",
    "2013-11-03 01:30:00",
    "2013-11-03 01:45:00",
    "2013-11-03 02:00:00",
};

// Get the time zone the input is meant to be interpreted in.
TimeZoneInfo tz = TimeZoneInfo.FindSystemTimeZoneById("Eastern Standard Time");

// Create an array for the output values
DateTimeOffset[] output = new DateTimeOffset[input.Length];

// Start with the assumption that DST is active, as ambiguities occur when moving
// out of daylight time into standard time.
bool dst = true;

// Iterate through the input.
for (int i = 0; i < input.Length; i++)
{
    // Parse the input string as a DateTime with Unspecified kind
    DateTime dt = DateTime.ParseExact(input[i], "yyyy-MM-dd HH:mm:ss",
                                      CultureInfo.InvariantCulture);

    // Determine the offset.
    TimeSpan offset;
    if (tz.IsAmbiguousTime(dt))
    {
        // Get the possible offsets, and use the DST flag and the previous entry
        // to determine if we are past the transition point.  This only works
        // because we have outside knowledge that the items are in sequence.
        TimeSpan[] offsets = tz.GetAmbiguousTimeOffsets(dt);
        offset = dst && (i == 0 || dt >= output[i - 1].DateTime)
                 ? offsets[1] : offsets[0];
    }
    else
    {
        // The value is unambiguous, so just get the single offset it can be.
        offset = tz.GetUtcOffset(dt);
    }

    // Use the determined values to construct a DateTimeOffset
    DateTimeOffset dto = new DateTimeOffset(dt, offset);

    // We can unambiguously check a DateTimeOffset for daylight saving time,
    // which sets up the DST flag for the next iteration.
    dst = tz.IsDaylightSavingTime(dto);

    // Save the DateTimeOffset to the output array.
    output[i] = dto;
}


// Show the output for debugging
foreach (var dto in output)
{
    Console.WriteLine("{0:yyyy-MM-dd HH:mm:ss zzzz} => {1:yyyy-MM-dd HH:mm:ss} UTC",
                       dto, dto.UtcDateTime);
}

Output:

2013-11-03 00:45:00 -04:00 => 2013-11-03 04:45:00 UTC
2013-11-03 01:00:00 -04:00 => 2013-11-03 05:00:00 UTC
2013-11-03 01:15:00 -04:00 => 2013-11-03 05:15:00 UTC
2013-11-03 01:30:00 -04:00 => 2013-11-03 05:30:00 UTC
2013-11-03 01:45:00 -04:00 => 2013-11-03 05:45:00 UTC
2013-11-03 01:00:00 -05:00 => 2013-11-03 06:00:00 UTC
2013-11-03 01:15:00 -05:00 => 2013-11-03 06:15:00 UTC
2013-11-03 01:30:00 -05:00 => 2013-11-03 06:30:00 UTC
2013-11-03 01:45:00 -05:00 => 2013-11-03 06:45:00 UTC
2013-11-03 02:00:00 -05:00 => 2013-11-03 07:00:00 UTC

Note that this assumes that the first time you encounter an ambiguous time like 1:00 that it will be in DST. Say your list was truncated to just the last 5 entries - you wouldn't know that those were in standard time. There's not much you could do in that particular case.

Upvotes: 3

Related Questions