user152949
user152949

Reputation:

Internal Distributed Time Server implementation

I've made a Internal Distributed Time Server (with no masters) for our upcoming distributed NoSQL database system. It should handle Byzantium clock and clock skew issues as long as 2/3's of the clocks in the distributed system are correct.

I would however like to see how someone else have implemented this kind of patters (not interested in master/slave pattern implementations like those based on IEEE 1588) - preferrably some open source code already in use - to assert that I've correctly implemented it as it's hard to write unit tests for it.

Does anyone know of such an open source implementation? The programming language we use C++ so I would prefer C/C++ references, though it may not be all that important as long as the code is humanly readable.

Here is the code (partially pseudo-code for the sake of simplicity) of my implementation so far:

/*!
\brief Maximum allowed clock skew in milliseconds
\details A network node that has a clock skew greater than this value will be ignore
* and an error message will be logged
\note Maximum divergence (drift) between two clocks on the network nodes will be 3x this amount if we 
* have a worst case Byzantium clock issue
*/
#define MAX_ALLOWED_CLOCK_SCEW_MS 333

/*!
\class CTimeServer
\brief Internal distributed time server
\details The time server frequently recieves the time from all the other master server nodes
* in the DBMS and calculates the current time by averaging all of the recieves timestamps.
\note If a server node has a greater clock skew than \c MAX_ALLOWED_CLOCK_SCEW_MS then it its
* timestamp is ignored and an error message is logged
\note Clocks are accurately synchronized until more than 1/3 of the nodes have Byzantium clock issues
\author Inge Eivind Henriksen
\date February 2014
*/
class CTimeServer
{
    private:
        /** System offset in milliseconds */
        std::atomic<int> offsetAverageMs;

        /*!
        \brief Node offsets type
        \par key Node ID
        \par value Offset in milliseconds
        */
        typedef std::map<int, int> nodeOffset_t;

        /*!
        \brief Iterator type for \c nodeOffset_t
        \relates nodeOffset_t
        */
        typedef nodeOffset_t::iterator nodeOffsetIter_t;

        /** Node offsets */
        nodeOffset_t nodeOffsets;

        /*!
        \brief Calculates the offset time in milliseconds between all the nodes in the distributed system
        */
        int CalculateOffsetMs() {
            bool exists;
            nodeOffsetIter_t offsets_iter(&nodeOffsets);
            int offsetMs = offsets_iter.first(&exists);
            int averageMs = 0;

            while (exists)
            {
                averageMs += offsetMs;
                averageMs /= 2;

                // Get the next file manager in the map
                offsetMs = offsets_iter.next(&exists);
            }

            return averageMs;
        }
    public:
        CTimeServer() {
            offsetAverageMs = 0;
        }

        /*!
        \brief Register the time of a node
        \param nodeHostName [in] Network node host name or IP address
        \param nodeId [in] Network node ID
        \param timestamp [in] Network node timestamp
        */
        void RegisterNodeTime(const wchar_t *nodeHostName, int nodeId, time_t timestamp) {
            int now = (int)time(NULL);
            int offset = (int)timestamp - now;

            // Make sure the node clock is within the permitted values
            if (abs(offset) > MAX_ALLOWED_CLOCK_SCEW_MS)
            {
                // Node clock skew was outside the permitted limit, so remove it from the list of valid time offsets
                nodeOffsets.erase(nodeId);

                // Throw an error
                std::wstringstream err;
                err << L"Network node " << nodeHostName << L" exceeded the maximum allowed clock skew of " 
                    << MAX_ALLOWED_CLOCK_SCEW_MS << L" ms by " << offset << " ms. Set the clock to correct this problem.";
                throw err.str().c_str();
            }

            nodeOffsets.update(nodeId, offset);

            // Recalculate the offset average
            offsetAverageMs.store(CalculateOffsetMs());
        }

        /*!
        \brief Get the distributed system time
        \returns The distributed system time
        */
        time_t GetTime() {
            int now = (int)time(NULL);
            return (time_t)(now + offsetAverageMs.load()));
        }

Upvotes: 6

Views: 628

Answers (1)

eh9
eh9

Reputation: 7428

There's a fair amount of literature in time synchronization protocols, particularly for wireless sensor networks, where the deployment environment does not lend itself to time masters. There's a decent introduction to the topic on this page. The protocol that seems to have gotten the most attention is Flooding Time Synchronization Protocol (FTSP), from a paper by that name by Maróti, Kusy, Simon, and Lédeczi. There's an implementation I found for TinyOS described on its wiki, which has the kind of code you're seeking.

There's a caveat to make about any master-less system: there's no notion of "correct" time. The best you can get is convergence of the nodes to a common time reference. This is a consensus time, but it shouldn't be considered an authoritatively "correct" one.

Upvotes: 6

Related Questions