About | Disclaimer | Webmaster

Note: Windows is a registered trademark of Microsoft Corporation in the United States and other countries. The Windows Timestamp Project is an independent publication and is not affiliated with, nor has it been authorized, sponsored, or otherwise approved by Microsoft Corporation.

Microsecond Resolution Time Services for Windows

Arno Lentfer, June 2012
Updated in October 2013 / February 2014 (V 1.22 &V 1.30 / V 1.50)

1.  Abstract

Various methods for obtaining high resolution time stamping on Windows have been described. The most promising implementations have been proposed by W. Nathaniel Mills: "When microseconds matter" (2002) and Johan Nilsson: "Implement a Continuously Updating, High-Resolution Time Provider for Windows " (2004).

Suggested auxiliary initial reading: Keith Wansbrough: "Obtaining Accurate Timestamps under Windows XP" (2003), msdn: "Guidelines for Providing Multimedia Timer Support", and Chuck Walbourn: "Game Timing and Multicore Processors" (2005).

A substantial amount of time and effort has been spent on the attempt to get a proper high resolution time service implemented for Windows. However, the performance of these implementations is still not satisfactory. The complexity arises from the variety of Windows versions running on an even greater variety of hardware platforms.

Proper implementation of an accurate time service for Windows will be discussed and diagnosed within the Windows timestamp project. Test code will be released to prove functionality on a broader range of hardware platforms. Besides the timestamp functionality, high resolution (microsecond) timer functions are also discussed.

2.  Resources

Time resources on Windows are mostly interrupt controlled entities. Therefore, they show a certain granularity. Typical interrupt periods are 10 ms to 20 ms. However, the interrupt period can also be set to be 1 ms or even a little below 1 ms by using API calls to NTSetTimerResolution or timeBeginPeriod. However, for several reasons they can and shall never be set to anything near the 1 μs regime. The best resolution to observe by means of Windows time services is therefore in the 1 ms regime.

The best resource for retrieving the system time is the GetSystemTimeAsFileTime API. It is a fast access API that is able to hold sufficiently accurate (100 ns units) values in its arguments. The alternative API is GetSystemTime, which is 20 times slower, has double the structure size, and does not provide a well-suited data format.

An interrupt independent system resource is used to extend the accuracy into the microsecond regime i.e., the performance counter. The performance counter API provides the asynchronous calls QueryPerformanceCounter and QueryPerformanceFrequency. A virtual counter delivers a performance counter value, which increases by a performance counter frequency. The frequency is typically a few MHz and can therefore open the microsecond regime. The counter parameters are typically backed by a physical counter, but they are not necessarily independent of the version of the operating system. A hardware platform can deliver different performance frequencies when running Windows 7 or Windows Vista, for example.

The Sleep() API and the WaitableTimer API are further timing resources in the context of this project. Their functionality and their habit also need to be looked at.

2.1.  GetSystemTimeAsFileTime API

The GetSystemTimeAsFileTime API provides access to the system time in file time format. It is stated as

void WINAPI GetSystemTimeAsFileTime(OUT LPFILETIME lpSystemTimeAsFileTime);

with its argument of type

typedef struct _FILETIME {
DWORD dwLowDateTime;
DWORD dwHighDateTime;
} FILETIME;

A 64-bit FILETIME structure receives the system time as FILETIME in 100ns units, which have been expired since Jan 1, 1601. After some 400 years about 1.28×1010 seconds or 1.28×1017 100ns slices have been accumulated. The 64-bit value can hold almost 2×1019 100 ns time slices. The remaining time before this scheme wraps would be about 58,000 years from now. The call to GetSystemTimeAsFileTime typically requires 10 ns to 15 ns.

In order to investigate the real accuracy of the system time provided by this API, the granularity that comes along with the time values needs to be discussed. In other words: How often is the system time updated? A first estimate is provided by the hidden API call:

NTSTATUS NtQueryTimerResolution(OUT PULONGMinimumResolution,
OUT PULONGMaximumResolution,
OUT PULONGActualResolution);

NtQueryTimerResolution is exported by the native Windows NT library NTDLL.DLL. The ActualResolution reported by this call represents the update period of the system time in 100 ns units, which obviously does not necessarily match the interrupt period. The value depends on the hardware platform. Common hardware platforms report 156,250 or 100,144 for ActualResolution; older platforms may report even larger numbers. This is one of the heartbeats controlling the system. The MinimumResolution and the ActualResolution are relevant for the multimedia timer configuration. Two common hardware platform configurations are discussed here to highlight the details to be dealt with:

Platform configuration A

- MinimumResolution:156,250
- MaximumResolution:10,000
- ActualResolution:156,250

Platform configuration B

- MinimumResolution:100,144
- MaximumResolution:10,032
- ActualResolution:100,144

Platform A simply has 64 timer interrupts per second (64 x 156,250 x 100 ns = 1 s), but when looking at platform B the difficulties become more obvious: 99.856 interrupts per second? Answer: The full second interrupt is not available on all platforms.

However, the system time may be updated at these interrupt events. An API call to

BOOL WINAPI GetSystemTimeAdjustment(OUT PDWORD lpTimeAdjustment,
OUT PDWORD lpTimeIncrement,
OUT PBOOL lpTimeAdjustmentDisabled);

will disclose the time adjustment and time increment values. The actual purpose of this call is to query the status of the system time correction, which is active when TimeAdjustmentDisabled is FALSE. When TimeAdjustmentDisabled is TRUE, no adjustment takes place and TimeAdjustemt and TimeIncrement are equal and do report exactly what was read as ActualResolution before. For a platform A type system, the call will report that the system time has incrementally increased by 156,250 100 ns units every 156,250 100 ns units. Within this description, this is considered the granularity of the system time.

Knowing the system time granularity raises doubts about its accuracy. Certainly, the TimeIncrement will be applied, thus changes of the system time will always be one TimeIncrement, but does the interrupt period or any multiple of it always match the time increment?

Even when the standard setting of ActualResolution corresponds to the MinimumResolution, the ActualResolution may have a setting different from MinimumResolution (see table below). In fact it may be configured to values in the range from MinimumResolution to MaximumResolution. The ActualResolution determines the interrupt period of the system. That is the period after which the timer generates an interrupt to let the system react. The ActualResolution can be set by using the API call

NTSTATUS NtSetTimerResolution(IN ULONGRequestedResolution,
IN BOOLEANSet
OUT PULONGActualResolution);

or via the multimedia timer interface

MMRESULT timeBeginPeriod(UINT uPeriod);

with the value of uPeriod derived from the range allowed by

MMRESULT timeGetDevCaps(LPTIMECAPS ptc, UINT cbtc );

which fills the structure

typedef struct {
UINT wPeriodMin;
UINT wPeriodMax;
} TIMECAPS;

Typical values are 1 ms for wPeriodMin and 1,000,000 ms for wPeriodMax. The 1,000 s period for wPeriodMax is somewhat meaningless within the context of this description. However, the possibility of setting the timer resolution to 1 ms requires a more detailed investigation. When the multimedia timer interface is used to set the multimedia timer to wPeriodMin, the ActualResolution received by a call to NtQueryTimerResolution will show a new value. For the two platform configurations discussed, the examples are as follows:

Platform configurationAB
MinimumResolution156,250100,144
MaximumResolution 10,000 10,032
ActualResolution 156,250100,144

ActualResolution varies according to the varying multimedia timer periods uPeriod applied by the timeBeginPeriod() API:

Platform configurationABuPeriod
ActualResolution9,76610,0321 ms
ActualResolution19,53220,0642 ms
ActualResolution19,53230,0963 ms
ActualResolution39,06339,9524 ms
ActualResolution39,06349,9845 ms
ActualResolution39,06360,0166 ms
ActualResolution39,06370,0487 ms
ActualResolution156,25080,0808 ms
ActualResolution156,25089,9369 ms
ActualResolution156,250100,14410 ms
ActualResolution156,250100,14411 ms
ActualResolution156,250100,14412 ms
ActualResolution156,250100,144100 ms

This list shows the supported interrupt periods for platforms of type A and B in 100 ns units. Platform A only supports four different interrupt heartbeat frequencies, while platform B has a better approximation to the desired period. The specific numbers are relevant for the procedures described here and thus need a detailed interpretation.

Note: TimeIncrement provided by GetSystemTimeAdjustment and ActualResolution provided by NtQueryTimerResolution are not necessarily identical. Platform A operates with an ACPI PM timer and platform B operates with a PIT timer. More modern platforms do not show "unsupported" values of uPeriod.

2.1.1.  ActualResolution on Platform Type A

The timer intervals are given with 100 ns accuracy in the last digit. Since the true ActualResolution cannot be expressed correctly, rather than reporting the true ActualResolution of 0.9765625 ms the call to NtQueryTimerResolution reports the rounded value of 0.9766 ms. The other values are also rounded (shall be 1.953125 ms and 3.90625 ms respectively).

A quick test using the Sleep(dwMilliseconds) API confirms this assumption:

Sleep(1) = 1.9531 ms = 2 x 0.9765625 ms

Sleep(2) = 2.9295 ms = 3 x 0.9765625 ms

Sleep(3) = 3.9062 ms = 4 x 0.9765625 ms

The Sleep() will only return when n x ActualResolution exceeds the desired duration. The required accuracy for the interval specification would have to extend to 0.5 ns, in other words show the 100 ps digit. The number would be 156,250,000 for the MinimumResolution and 9,765,625 for the MaximumResolution (in 100 ps or 10-10 s units).

Note: Sleep(1) measurements (10,000, with 100 ahead) result in a mean delay of 1953.163824 μs. This is 2.0000397 times the interrupt time slice (should have been 1953.125 μs, so the measurement was off by 0.04 μs).

2.1.2.  ActualResolution on Platform Type B

An interrupt timer period of 1.0032 ms will accumulate 10.032 ms after 10 interrupts and change the system time by 10.0144 ms. A time change of 10.0144 ms after 10.032 ms means that the time is behind by 176 μs. At the 57th of such periods, the deviation has accumulated to 1.0032 ms, which is exactly one timer interrupt period and the time will be updated after just 9 interrupts (9.0288ms). This way the time is updated by 10.0144 ms 56 times after 10.032 ms and one time after 9.0288 ms, which is a total elapsed time of 570.8208 ms with an adjustment of 57*10.0144 ms = 570.8208 ms. This corresponds to a total number of interrupts of 569 (57*100,144 = 569*10,032). As a result, the time will lose 176 μs for each of the 56 consecutive system time updates and then gain 9.856 ms in the 57th interrupt interval.

2.1.3.  Changes of System File Time

The system time changes according to the described mechanisms after a certain period of time. Additional time changes do happen if time corrections are caused by periodic time changes, which are continuously applied to the system time over a longer period of time to adjust to an external time reference. The occurrence and the parameters of this adjustment can be gathered by a call to GetSystemTimeAdjustment. Sudden time changes, for example, introduced by using the clock GUI or SetSystemTime(…) , are not announced or predictable; they happen spontaneously.

Changes of the system time will have no influence on the expiration of Sleep periods or waitable timer periods. The actual change will be taken over by the routines here. Nevertheless, system time changes are discontinuities in time, whether they are sudden or spread over a longer period of time. What is an accurate time stamp supposed to deliver when the system inserts several hundred seconds at an interval of 1.0000032 s? The system will assume that the seconds are that long (elongated) for the time being. This can be accomplished by the temporary adaptation of the performance counter frequency to the applied granular time correction.

2.1.4.  Windows 7/8/8.1, Server 2008 R2/Server 2012/Server 2012 R2

Time services on windows have undergone changes with any new version of Windows. Considerable changes are to be reported beyond VISTA and Server 2008. The synchronous progress in hardware and software development requires the software to stay compatible with a whole variety of hardware platforms. On the other hand new hardware enables the software to conquer better performance. Today's hardware provides the High Precision Event Timer (HPET) and an invariant Time Stamp Counter (TSC). The variety of timers is described in "Guidelines For Providing Multimedia Timer Support". The "IA-PC HPET Specification" is now more than 10 years old and some of the goals have not yet been reached (e.g. aperiodic interrupts). While QueryPerformanceCounter benefited using the HPET/TSC when compared to ACPI PM timer, these days the HPET is outdated by the invariant TSC for many applications. However, the typical HPET signature (TimeIncrement of the function GetSystemTimeAdjustment() and MinimumResolution of the function NtQueryTimerResolution() are 156001) disappeared with Windows 8.1. Windows 8.1 goes back to the roots; it goes back to 156250. The TSC frequency is calibrated against HPET periods to finally get proper timekeeping.

An existing invariant TSC influences the behavior of GetSystemTimeAsFileTime() noticeable. The influence to the functions QueryPerformanceCounter() and QueryPerformanceFrequency() is described in sections 2.4.3. and 2.4.4. Windows 8 introduces the function GetSystemTimePreciseAsFileTime() "with the highest possible level of precision (<1us)". This seems the counterpart to the linux gettimeofday() function.

2.1.4.1.  Resolution, Granularity, and Accuracy of System Time

Since Windows 7, the operating system runs tests on the underlying hardware to see which hardware is best used for timekeeping. When the processors Time Stamp Counter (TSC) is suitable, the operating system uses the TSC for timekeeping. If the TSC cannot be used for timekeeping the operating system reverts to the High Precision Event Timer (HPET). If that does not exist it reverts to the ACPI PM timer. For performance reasons it shall be noted that HPET and ACPI PM timer cause IPC overhead, while the use of the TSC does not. The evolution of TSC shows a variety of capabilities:

Details of the TSC capabilities are described in "Intel 64 and IA-32 Architectures Software Developer’s Manual". Chapter 16.12.1 of this documentation releases the key for using the TSC for wall clock timer services:

"The time stamp counter in newer processors may support an enhancement, referred to as invariant TSC. Processor’s support for invariant TSC is indicated by CPUID.80000007H:EDX[8].

The invariant TSC will run at a constant rate in all ACPI P-, C--, and T-states. This is the architectural behavior moving forward. On processors with invariant TSC support, the OS may use the TSC for wall clock timer services (instead of ACPI or HPET timers). TSC reads are much more efficient and do not incur the overhead associated with a ring transition or access to a platform resource."

An invariant TSC enables QueryPerformanceCounter(), QueryPerformanceFrequency(), and GetSystemTimeAsFileTime() to be served by the same hardware. Deviations, as described in 2.4.3 are non existing when the performance counter values and the wall clock are supported by the same counter (TSC).

More information can be obtained in "Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3B: System Programming Guide, Part 2".

Polling system time changes by repeated call of GetSystemTimeAsFileTime() discloses a new behavior on Windows 8: Examples given in 2.1.1. and 2.1.2. are typical timekeeping schemes for systems running with a ACPI PM timer a PIT timer respectively. System time changes occurred at some regular base. This is not the case on Windows 8; a whole bunch of varying file time increments is observed when polling on file time transition. A truly periodic cycle can only be approximated by a "mean increment". However, this mean increment matches the result given by ActualResolution. Despite these little hiccups, resolution, granularity, and accuracy of GetSystemTimeAsFileTime() are comparable to earlier Windows versions.

2.1.4.2.  Desktop Applications: GetSystemTimePreciseAsFileTime()

This new Windows 8 API is restricted to desktop applications.

VOID WINAPI GetSystemTimePreciseAsFileTime(_Out_ LPFILETIME,
lpSystemTimeAsFileTime);

GetSystemTimePreciseAsFileTime() uses the performance counter to achieve the microsecond precision. Depending on the hardware platform and Windows version, a call to QueryPerformanceCounter may be expensive or not (HPET, ACPI PM timer, or TSC, see "MSDN: Acquiring high-resolution time stamps." ). Consecutive calls may return the same result. The call time is less than the smallest increment of the system time. The granularity is in the sub-microsecond regime. The function may be used for time measurements but some care has to be taken: Time differences may be ZERO.

The function shall also be used with care when a system time adjustment is active. Current Windows versions treat the performance counter frequency as a constant. The high resolution of GetSystemTimePreciseAsFileTime() is derived from the performance counter value at the time of the call and the performance counter frequency. However, the performance counter frequency should be corrected during system time adjustments to adapt to the modified progress in time. Current Windows versions don't do this. The obtained microsecond part may be severely affected when system time adjustments are active. Seconds may consist of more or less than 1.000.000 microseconds. Microsoft may or not fix this in one of the next updates/versions.

GetSystemTimePreciseAsFileTime() works on all platforms.

2.2.  The Sleep API

The Sleep function suspends the execution of the current thread for a specified interval.

VOID Sleep(DWORD dwMilliseconds);

This would indeed be a very useful function if it were doing what it is supposed to do. Unfortunately, a detailed view discloses some artifacts, some of which are helpful, and others that are not. The Sleep() function is backed up by the system's interrupt services. As described in section 2.1, the interrupt period can be configured to some extent. This has a direct impact on Sleep(). The call to Sleep() passes the parameter dwMilliseconds to the system and expects the function to return after dwMilliseconds. In practice the Sleep() only returns when two conditions are met: Firstly, the requested delay must be expired and secondly an interrupt has occurred (the test to see if the requested delay has expired is only done with an interrupt). A simple Sleep(1) call may therefore have a number of different results. The results also depend on the time at which the call was made with respect to the interrupt period phase.

Say the ActualResolution is set to 156,250, the interrupt heartbeat of the system will run at 15.625 ms periods or 64 Hz and a call to Sleep is made with a desired delay of 1 ms. Two scenarios are to be looked at:

The observed delay heavily depends on the time at which the call was made. This matters particularly when the desired delay is shorter than the ActualResolution. However, when the ActualResolution is set to MaximumResolution, the system runs at its maximum interrupt frequency and the deviations are in the order of one interrupt period.

This behavior can be used to synchronize code with the interrupt period in an easy way by simply calling two or more consecutive sleeps. Regardless of what ΔT is, the first will end at the time of an interrupt. Consequently the following sleep call will start at the interrupt time (or at least so close to it that the system will assume that it happened at the same time). As a result a ΔT = 0 applies and the sleep will return when N x ActualResolution becomes larger than the desired period. Right after the return of a sleep, the system has just processed an interrupt. Conditional latency may be on board due to a priority and/or task/process switching delay or due to interrupt handler CPU capture reasons. Typical latencies of a few μs can be observed with very little implementation effort.

A special case is the call Sleep(0). It looks meaningless, but it is a very powerful tool since it relinquishes the reminder of the thread's time slice. That means that other threads of equal priority level will take over when ready to run. When a number of threads are running at the same priority level and all of them are very responsive, all of them will make frequent calls to Sleep(0) whenever they can afford it. As a result, a task switch can be forced to happen in just a few μs.

2.3.  The WaitableTimer API

Another important mechanism for performing timed operations is provided by the waitable timer interface:

HANDLE WINAPI CreateWaitableTimer(IN LPSECURITY_ATTRIBUTES lpTimerAttributes,
IN BOOL bManualReset,
IN LPCTSTR lpTimerName);

The returned handle is used to setup a timer function:

BOOL WINAPI SetWaitableTimer(IN HANDLE hTimer,
IN const LARGE_INTEGER* pDueTime,
IN LONG lPeriod,
IN PTIMERAPCROUTINE pfnCompletionRoutine,
IN LPVOID lpArgToCompletionRoutine,
IN BOOL fResume);

This tool can be unsed in a variety of ways. Below are just a few things that need to be mentioned within the scope of this description:

The expired (signaled) timer can be handled by means of an asynchronous procedure (APC) call or by means of a call to WaitForSingleObject, for example. According to the last point above, the former is useless when high accuracy is required. The latter suits the needs of the mechanisms described here much better. The API needs the handle to the object to wait for and allows specifying a timeout dwMilliseconds, which can be optionally set to INIFINTE.

DWORD WINAPI WaitForSingleObject(IN HANDLE hHandle,IN DWORD dwMilliseconds);

Waitable timers synchronize to the rhythm of the systems interrupt period (ActualResolution). This has to be kept in mind because it has severe implications to the system's overall performance. All of the tasks waiting for a Sleep() or a timer to reach a signaled state will continue after the interrupt has occurred. The system's load tends to reach peaks at interrupts.

2.4.  The QueryPerformanceCounter and QueryPerformanceFrequency API

This API is backed by a virtual counter running at a "fixed" frequency started at boot time. The following two basic calls are used to explore the microsecond regime: QueryPerformanceCounter() and QueryPerformanceFrequency(). The counter values are derived from some hardware counter, which is platform dependent. However, the Windows version also influences the results by handling the counter in a version specific manner. Windows 7, in particular has introduced a new way of supplying performance counter values.

2.4.1.  QueryPerformanceCounter

The call to

BOOL QueryPerformanceCounter(OUT LARGE_INTEGER *lpPerformanceCount);

will update the content of the LARGE_INTEGER structure PerformanceCount with a count value. The count value is initialized to zero at boot time.

2.4.2.  QueryPerformanceFrequency

The call to

BOOL QueryPerformanceFrequency(OUT LARGE_INTEGER *lpFrequency);

will update the content of the LARGE_INTEGER structure PerformanceFrequency with a frequency value. The frequency is treated by the system as a constant. From Windows 7/Server 2008 R2 onwards the result of QueryPerformanceCounter() may be calibrated at boot time and may therefore return varying results. This depends on the underlying hardware (see 2.1.4.1.), But QueryPerformanceCounter() never reports any changes of the frequency during operation; its result remains constant. The following chapter describes deviations on systems on which the underlying hardware neither provides an invariant TSC nor provides a HPET for time services.

2.4.3.  Performance of the Performance Counter

The range in time that can be held by the LARGE_INTEGER structure PerformanceCount depends on the update rate or the Frequency at which the count will incrementally increase. Depending on the hardware platform the counter may be an Intel 8245 at 1,193,000 Hz or an ACPI Power Management Timer chip with an update frequency of 3,579,545 Hz or even another source. A number of Platforms do not have these timers at all; they mimic the timer by providing the CPU clock. As a result of the latter, the frequency can get into the GHz range. PerformanceCount.QuadPart (signed) will change sign after 263 increments. At a frequency of say 1GHz (109 s-1), such a system can run for about 290 years without reaching the sign bit. Even for multi-GHz platforms, there does not seem to be a serious limit.
However, apart from the system's treatment, the frequency cannot be considered being constant. Firstly, the frequency generating hardware will deviate from the specified value by an offset and secondly the frequency may vary (i.e., due to thermal drift). The impact of these deviations is not negligible. Oscillators do have tolerances in the range of a few ppm and would consequently introduce errors of a few μs/s in the measured time period. Within this description the performance counter will be used to predict time intervals over a few seconds at accuracies better than 1 μs. If an accuracy of 0.1 μs is reached after 10s, the frequency needs to be known to 0.01 ppm, which corresponds to 0.035 Hz at a nominal frequency of 3,579,545 Hz. Obviously, that value is not provided by the system and needs to be calibrated. A first estimate of the true frequency can be gathered by querying two counter values at a certain (known) time apart from each other. The code snippet uses the API call

DWORD timeGetTime(VOID);

and could look like this:

DWORD ms_begin,ms_end;
LARGE_INTEGER count_begin, count_end;
Double ticks_per_second;
ms_begin = timeGetTime();
QueryPerformanceCounter(&count_begin);
Sleep(1000);
ms_end = timeGetTime();
QueryPerformanceCounter(&count_end);
ticks_per_second = (double)(count_end−count_begin)/(ms_end−ms_begin);

However, due to artifacts described in 2.2, timeGetTime() is accompanied by an inaccuracy of up to 2 ms, thus a Sleep(1,000) would give an accuracy for ticks_per_second of 0.002 (2,000 ppm) at most. An accuracy of 2 ppm would be achievable when the Sleep extends to 1,000,000 ms or 1,000s. In order to obtain 0.01 ppm, the Sleep would have to cover more than 55 hours. This is obviously a hopeless approach. It also averages temporary changes of the frequency and it will not forgive frequency changes due to thermal drifts. The thermal drift of the performance counter frequency can be severe:


thermal_drift.png
Fig. 2.4.3.1: Calibrated Performance Counter Offset on ACPI PM timer hardware.

This graph shows an older system with heavy thermal drift. At boot time (~8:00) the measured performance counter frequency is off by about 60Hz. The system reports the performance counter frequency as 3,579,545 Hz. In fact, it is already at 3,579,605 Hz when it is "cold". After many hours of doing nothing, the system seems to reach a thermal equilibration. At ~14:00 (six hours after boot), the system was heavily loaded for about 45 minutes and consequently warmed up. The load has increased the main board temperature by 5 deg. (centigrade scale) only, but the influence to the measured performance counter frequency is quite considerable. It rose to an offset of almost 100 Hz or a true performance counter frequency of 3,579,645 Hz. A 100 Hz offset at a base frequency of 3,579,605 Hz is a deviation of about 28 ppm or an error in time of 28 μs/s.

The calibration procedure used for the time stamp mechanism described here uses a repeated averaging period evaluation and reaches an accuracy of better than 0.05 ppm after about 100s. Thermal drifts can be captured reasonably well and can be applied without much delay. (Note: The declaration of ticks_per_second as a 64-bit float in the code snippet above enables the ticks_per_second to hold a number with an accuracy of 15 digits. A value of 3,579,545.12 Hz shows the 0.01 ppm accuracy in the last digit.)

The use of QueryPerformanceCounter on multi-processor platforms implies that the call is made on the same processor all the time. The SetThreadAffinityMask API and its associated calls are used to ensure this. This rule only applies to systems using non invariant TSC hardware. The system analyzed in this chapter operates time services based on ACPI PM hardware.

2.4.4.  Is the CPU Time Stamp Counter an Alternative?

The RDTSC specifies a call to query the time stamp counter of the CPU. The advent of multi processor platforms or muti-core processors highly recommends not using RDTSC calls. Newer processors also support adaptive CPU frequency adjustments. This is just another reason to not use RDTSC calls for the purpose discussed here. Microsoft strongly discourages using the TSC for high-resolution timing ("Game Timing and Multicore Processors"). However, the introduction of invariant Time Stamp Counters has changed the situation. Starting with Windows 7/Server 2008 R2, Windows has a clear preference: Look for invariant TSCs, see whether they can be synchronized on different cores and use them for wall clock and performance counter whenever possible ("MSDN: Acquiring high-resolution time stamps.").

2.5.  Discussion of Resources

Some of the resources discussed show a platform-specific behavior. They may deliver results depending on the hardware and/or on Windows version. The precision time functions developed within the windows timestamp project mainly rely on four function suites provided by the operating system:

The complexity of the system time update with respect to the interrupt settings was explained and is understood. A complex automatic diagnosis of the system has to establish proper settings in order to obtain the desired accuracy. Particularly, the continuous calibrations of the performance counter frequency described in 2.4.3 is of utmost importance to obtain high accuracy. In addition, the proper interrupt period setting to obtain truly cyclic timer behavior (e.g., as described for example in 2.1) is very important. Another set of APIs is used to establish functionality:

The description of these functions falls outside the scope of this description.

3.  Goals

The Windows Timestamp Project provides the tools to enable access to time at microsecond resolution and accuracy. Furthermore, it provides timer functions at the same resolution and accuracy. The high accuracy and microsecond resolution are archived by synchronizing the system time with the performance counter. In fact, the performance counter is phase locked to the system time. A diagnosis determines the system's specific parameters and establishes a "truly cyclic" timer interval for updating the phase of the performance counter value. The drift of the performance counter is permanently evaluated and taken into account while the system is running.

The code runs in a real-time priority process providing time information. An auxiliary IO process builds the interface to an optional graphical user interface. Nonblocking IO enables proper performance testing and debugging.

3.1.  Time Support

Any time providing mechanism needs time for its internals. Thus, the following question arises with respect to time: Is the time requested at the time the call is made or shall the time be reported at the time in which the call returns? This may sound strange, but considering the level of resolution and accuracy aimed for here, it matters.

Example:

Two time functions are implemented to fulfill these two needs:

3.1.1.  GetTimeStamp

The function GetTimeStamp, declared as

void GetTimeStamp(TimeStamp_TYPE * TimeStamp);

fills the argument pointed to by TimeStamp with numbers according to the TimeStamp structure definition:

typedef struct {
long long Time;
long long ScheduledTime;
long Accuracy;
TimeStamp_TYPE;

The 64-bit value Time represents the number of elapsed 100-nanosecond intervals elapsed since January 1, 1601. ScheduledDueTime reports the system file time at which the next reference time is scheduled for an attempt to update the phase. This value should primarily be used to verify the operation of the precision time mechanism. If ScheduledDueTime is noticeable behind the current system file time, the scheduled update of the time reference must have failed for a number of consecutive attempts.
Finally the 32-bit value of Accuracy gives an estimate of the assumed accuracy (rms) of the time stamp in 1 ns units (error in ns/s).

The call to GetTimeStamp is fast (a few thousand CPU cycles max.) and it reports the time at the time it is called.

3.1.2.  Time

A simple function is stated as

long long Time(void);

The function is as fast as GetTimeStamp and it returns the time at the time the call returns. With the need for a few thousand CPU cycles, the call will require very few μs with the current hardware. The Time() can be used to compare times or to wait until a certain time is observed. The 64-bit return value represents the number of elapsed 100-nanosecond intervals since January 1, 1601.

3.2.  Timer Support

A set of timer functions:

HANDLE CreateTimedEvent(BOOL bManualReset,
LPCTSTR lpTimerName);

bManualReset [in]

If this parameter is TRUE, the function creates a manual reset event object, which requires the use of the ResetEvent function to set the event state to nonsignaled. If this parameter is FALSE, the function creates an auto reset event object, and the system automatically resets the event state to nonsignaled after a single waiting thread has been released.

lpTimerName [in, optional]

The name of the event object. The name is limited to MAX_PATH characters. Name comparison is case sensitive. If lpTimerName matches the name of an existing named event object, this function will fail. If lpTimerName is NULL, the event object is created without a name. If lpTimerName matches the name of another kind of object in the same namespace (such as an existing semaphore, mutex, waitable timer, job, or file-mapping object), the function fails and the GetLastError function returns ERROR_INVALID_HANDLE. This occurs because these objects share the same namespace. The name can have a "Global\" or "Local\" prefix to explicitly create the object in the global or session namespace. The remainder of the name can contain any character except the backslash character (\). For more information, see Kernel Object Namespaces. Fast user switching is implemented using Terminal Services sessions. Kernel object names must follow the guidelines outlined for Terminal Services so that applications can support multiple users. The object can be created in a private namespace. For more information, see Object Namespaces.

Return value

If the function succeeds, the return value is a handle to the event object. If the named event object existed before the function call, the function returns NULL and GetLastError returns ERROR_ALREADY_EXISTS. If the function fails, the return value is NULL. To get extended error information, call GetLastError.


int SetTimedEvent(HANDLE hTimerEvent,
long long TimerDueTime,
long long TimerPeriod);

hTimerEvent [in]

A handle to a named timed event. The CreateTimedEvent() function returns this value.

TimerDueTime [in]

The time after which the state of the timer is to be set to signal in 100 nanosecond intervals. Positive values indicate absolute time. Be sure to use a UTC-based absolute time, since the system uses UTC-based time internally. Negative values indicate relative time.

TimerPeriod [in]

The period of the timer in 100 ns intervals. If TimerPeriod is zero, the timer is signaled once. If TimerPeriod is greater than zero, the timer is periodic. A periodic timer automatically reactivates each time the period elapses, until the timer is canceled using the CancelTimedEvent function or reset using SetTimedEvent. If TimerPeriod is less than zero, the function fails.

Return value

If the function succeeds, the return value is nonzero. If the function fails, the return value is zero. To get extended error information, call GetLastError.


int CancelTimedEvent(HANDLE hTimerEvent);

hTimerEvent [in]

A handle to a named timed event. The CreateTimedEvent() function returns this value.

Return value

If the function succeeds, the return value is nonzero. If the function fails, the return value is zero. To get extended error information, call GetLastError.


HANDLE OpenTimedEvent(LPCTSTR lpTimerName);

lpTimerName [in]

The timed event name used when the timed event was created.

Return value

If the function succeeds, the return value is the handle to the named timed event. If the function fails, the return value is NULL. To get extended error information, call GetLastError.


int DeleteTimedEvent(HANDLE hTimerEvent);

hTimerEvent [in]

A handle to a named timed event. The CreateTimedEvent() function returns this value.

Return value

If the function succeeds, the return value is nonzero. If the function fails, the return value is zero. To get extended error information, call GetLastError.


These timer functions are based on timed events. The handle returned by CreateTimedEvent() is in fact a handle to a named event of which signaled state is supervised by a time service routine. Standard wait functions like WaitForSingleObject or WaitForMultipleObjects can be used to wait for the high resolution timer events.

4.  Implementation

Only two hardware platforms were described here to highlight some of the problems to bear in mind when implementing reliable time services for Windows. Many more configurations need to be diagnosed to ensure platform independent functionality to a large extent. However a flexible and automatic evaluation of hardware specific behavior may result in hardware independence.

The implementation of all the above into a time service is done by careful separation into different processes and threads. The time critical parts are hosted by a process running at real-time priority class. Some of the threads inside this process are even running at time-critical priority level. In the case of a multi-processor or multi-core system, certain threads are assigned to a specific CPU/core. This is the Kernel and hosts the time service routines. For testing and debugging the Kernel process has some IO capabilities shared with the IO process. A later version may not need this additional functionality. The high priority class requires the Kernel process to run with administrator privileges.

A second process hosts all kinds of less time critical service threads. It shares some IO service with the Kernel process by means of piped IO between these two processes. Furthermore it provides pipe services to the graphical user interface (GUI).

The third process is a graphical user interface (GUI), which runs optionally and helps in the current stage of the development to get an insight into what is going on.

The GUI and the IO process are development tools only. The only process that needs to run to access the time functions discussed here is the Kernel process.

4.1.  The Real-time Priority Class Process: Kernel

The Kernel is the heart of the time service described here. It provides the important link between the system file time and the performance counter value. The idea in this context is to provide data triplets of system file time, performance counter, and performance counter frequency. Knowing the performance counter value at a certain system file time allows the extrapolation of the system file time to the actual time by applying the performance counter value and the performance counter frequency. As discussed, the performance counter frequency is of insufficient accuracy; a refined performance counter frequency is supplied in format double (64bit float). There is also some internal information which allows a refinement of the performance counter value itself (as a result of some self-calibration). Thus, it is also represented in double (64bit float) format.

A typical result of such a data triplet could be:

This information is sufficient to establish time services. Querying the current performance counter value gives the difference to the value calculated to match the last captured file time. This difference is divided by the performance counter frequency and the result is the elapsed time since the last file time capture. This data triplet is, besides other parameters, written to a mutex protected shared memory section. Other processes/threads have access to this data triplet.

As described in sections 2.1 and 2.3, the important part is to get the file time updated correctly. It proves best to gather the data triplet exactly when the file time transits or just transited. Difficulties archiving this have been described for platform examples A and B. At startup, a complex diagnosis of the interrupt timing structure and file time update/transition structure is performed. This results in a timing scheme for updating the data triplet. The desired update period is in the range of 1 to 10 seconds. As discussed, the period duration influences the accuracy. Algorithms are looking for patterns and beat frequencies in the file time update and interrupt timing structure. As a result, a periodic timer is set up to run the data triplet generation and the calibration in parallel. Once exact ∆T file time periods do occur, the true performance counter frequency can be measured and averaged over a number of consecutive measurements. A running average over the last n captures is maintained at all times to provide information about the true (calibrated) performance counter frequency. When the accuracy of the average reaches a certain quality, the phase locking of file time change and performance counter is considered as established and timestamp requests are accompanied by information about their accuracy.

Running all of this at utmost priority ensures that there is very little overhead after an interrupt. Remember: Many processes/threads are waiting for interrupts. Therefore, systems do have a workload peak at the occurrence of an interrupt. Even running at such priority settings, it is unavoidable to be influenced by the load of other processes. However, the accuracy of this scheme easily stays below a few microseconds, even with heavy load on the system.

The routines Time() and GetTimeStamp() are applying the extrapolation scheme described here. Both calls are done in far less than 5μs, even on older systems.

The functionality of the timer routines listed in 3.2 is handled in this real-time process as well. Timed events are registered in a timer event queue. They are monitored with respect to their due time/period. When there is less than one interrupt period left before the due time expires, the timer service polls the timed event queue for the precise time to set the event. This may happen for a number of timed events, even within the same interrupt period. However, it should be noted that the time service thread is running at a high priority level and the signaled event may not be accessible to other processes/threads when there is just one CPU. A single CPU/core system simply cannot cope with multiple timed events setup to signal within the same interrupt period.

4.2.  Less critical services: The IO-Process

In order to implement the kernel as small as possible, much of the functionality is performed by a second process. The IO, in particular, matters. The IO process establishes pipe services to release the kernel from blocking IO. All IO done by the kernel is queued into the IO processes pipe service. These operations are nonblocking. A complex fprintf() can be queued in just a few microseconds. This allows extensive output for diagnosis. Furthermore, output is logged into a file.

4.3.  The Optional GUI-Process

The current GUI is mainly created for developing the time service. Meanwhile, it has become a valuable tool for diagnosing platforms. It runs optionally.


GUI_122.png
Fig. 4.3.1: The Graphical User Interface (Version 1.22).

The output is split into four tabs: the all output tab, the error messages tab, the Calibrated Performance Counter Offset tab, and the NTP Offset tab. The text output within the first two tabs is produced using the queued qfprintf(…) function. This function makes its message time stamped and shows also some other parameters of the output piping thread:

The output line format is:

yyyy-mm-dd hh:ii:ss.μμμμμμ.n (s/a) [PID.THID.Processor.Priority]: Message

As already mentioned, the GUI runs optionally and any number of GUI can be started and ended at anytime. Ending a GUI will neither end the kernel process nor end the IO process. In order to terminate the whole group of processes, the Kernel process has to be stopped. The Stop Kernel button (lower right corner) stops the kernel. By doing so, queued messages that are supposed to be processed are stuck. A few message windows will pop up to show the contents of the unprocessed parts of the queues of all involved processes. These popup windows are not error messages; they just report what was happening while the Kernel was stopped.

The plot at the left lower corner shows the history of the accuracy in μs/s during the last 600 seconds. The GUI produces this information by means of GetTimeStamp() imported from the time service DLL.

It also provides a tiny test of the timer functionality: A single shot timer can be setup. The due time setting here is absolute, thus the time has to be in the future. Hint: Use the Update Date/Time Fields button to get the actual time into the fields and than e.g. incrementally change the minute field by 1. Press the Create Timed Event button quickly before the due time expires. Progress of the timed event approaching its due time is shown next to the button, which has now converted into a Stop Timed Event Button to allow cancellation of the timed event. A message window will popup when the timed event has signaled. It shows the precise time at which the signaled state was detected and how much it deviates from the requested due time.

The output can be stopped for the all output tab (Hold Output Button). All output will be queued and the button converts into a Continue Output button until the Continue Output button is pressed. An optional auto cont. check box lets the GUI continue automatically when the queue buffer reaches a critical stage. The auto cont. check box can only be checked when the output is hold.

The Calibrated Performance Counter Frequency Offset tab shows the offset of the calibrated performance counter frequency. The graph shown in 2.4.3 was created within this tab. The graphs context menu (right mouse button) allows saving the graph or clearing the graph's data. Clearing the data will not stop further recording; creation of the graph will continue. Version 1.2 introduced the NTP Offset tab, the NTP/autoadjust status line, and the NTP/autoadjust check boxes. Details about these items are given in "Part II: Adjustment of System Time".

4.4.  The Libraries

The functions described above are accessible to other processes/threads through a static library (LIB) or a dynamic link library (DLL).

5.  Results

Microsecond resolution time stamps are possible on Windows systems. Resolution in the microsecond regime can be observed at accuracies of a few microseconds without distracting the system too much. Timer functions at the same resolution and accuracy are implemented and tested. Handling many timed events created by those timer functions set up to fire within the same millisecond is tricky but possible. The evaluation at the startup of the services may sometimes take a few seconds and needs all the CPU time. Doing this at utmost priority will freeze single core/processor systems for a moment.


A pdf version of "Microsecond Resolution Time Services for Windows" can be downloaded here.









Part II: Adjustment of System Time

Arno Lentfer, June 2013
Updated in October 2013 / February 2014 (V 1.22 &V 1.30 / V 1.50)

Windows provides the following simple tools to manage and monitor system time adjustments: The Internet Time GUI and the console application w32tm.exe. These tools are sufficient to obtain an initial rough estimate of the performance of the Windows internet time synchronization.

1.  The Internet Time GUI

Synchronization to an internet time server is accomplished directly from the user interface. Windows Vista, Windows 7 and Windows 8 provide the Internet Time Settings window and Windows XP provides the Internet Time tab in the Date and Time Properties window:

Fig_1_1_Vista_and_follower_internet_time.png
Fig. 1.1: Internet Time Settings window of Windows Vista and higher.

Fig_1_2_XP_internet_time.png
Fig. 1.2: Internet Time Settings window of Windows XP.

An internet time provider can be chosen from a list or a new NTP server address can be added to the list. It is also possible to add an IP address to the list. Adding an IP address may be advisable when the name represents a pool of servers and the server needs to be explicitly indicated.

The common "Update Now" button will attempt to synchronize the system time to the time server. This allows synchronization to take place or it becomes active upon confirmation. Note: The message "...has been successfully synchronized..." does not necessarily mean that synchronization has finished. It could also mean that a synchronization process was successfully started. Such processes can last for many hours.

2.  w32tm.exe

In order to verify the result or progress of the synchronization, another tool has to be run in parallel. The console application w32tm.exe allows monitoring of the offset of the local time to the time of an internet time server.

The easiest way to do this is from a console window with the following set of parameters:

w32tm /stripchart /computer:time.windows.com /period:120

As a result, the system time and its offset to the time server are dumped to the console every 120 seconds:

Tracking time.windows.com [65.55.21.14.123]
08:38:57 d:+00.0417301s o:+00.1024506s [ * ]
08:40:57 d:+00.0418632s o:+00.1037897s [ * ]
08:42:58 d:+00.0419165s o:+00.1015612s [ * ]
08:44:58 d:+00.0417048s o:+00.0985075s [ * ]
08:46:58 d:+00.0419394s o:+00.0942827s [ * ]
08:48:58 d:+00.0419296s o:+00.0913788s [ * ]
08:50:58 d:+00.0418867s o:+00.0883421s [ * ]

Each line consists of the local time (08:38:57), an internal delay (time difference between the udp package received and udp package sent on the server side, i.e., d:+00.0419394s), the actual offset between the local time and the server time (o:+00.1024506s) and a very basic stripchart of the offset.

The first output line of w32tm will also resolve the name of the time server (time.windows.com) to an IP (UDP port 123 is reserved for NTP). This is important because time.windows.com does not refer to a single server but rather to a pool of servers; therefore, consecutive attempts to synchronize to it may use different physical servers. However, w32tm resolves the IP of the server currently in use with w32tm. This IP can also be chosen as a server for the synchronization. For example, one of the addresses of the time.windows.com pool is 65.55.21.14. The best proof of quality is obtained when the IP address in the internet time GUI described above and the same IP address with the w32tm command are used:

w32tm /stripchart /computer:65.55.21.14 /period:120

3.  Results

The results obtained with w32tm are difficult to interpret. When the offset in time is large (i.e., several seconds), synchronization of the system time seems to happen in one step. In these cases, the remaining offset is typically larger than a few milliseconds. However, when the offset is less than a few seconds, an algorithm gently adjusts the offset in small steps. This procedure can take many hours.

It turns out that obtaining detailed insight into this adjustment algorithm by using w32tm is difficult. A more in-depth investigation may uncover the cause of the behavior observed, however, this requires additional software.

4.  Discussion

Applying the scheme described above frequently gives very dissatisfying results. Sometimes the synchronization results in a time offset that is worse than the offset prior to synchronization. In particular, Windows Vista and Windows 7 show strange behavior, e.g., seemingly never-ending adjustments to huge offsets.

A piece of software is necessary to find out the secret of the adjustment algorithm. Actual system time adjustment parameters can be obtained by a call to the function GetSystemTimeAdjustment because Windows performs the system time adjustment through calls to the function SetSystemTimeAdjustment.

BOOL WINAPI GetSystemTimeAdjustment(OUT PDWORD lpTimeAdjustment,
OUT PDWORD lpTimeIncrement,
OUT PBOOL lpTimeAdjustmentDisabled);

MSDN: "For each lpTimeIncrement period of time that actually passes, lpTimeAdjustment will be added to the time of day." Assuming this rule, the adjustment gain can be calculated:

gain = (lpTimeAdjustment - lpTimeIncrement)/ lpTimeIncrement

A simple program can call GetSystemTimeAdjustment frequently while a system time adjustment is active and evaluate the gains for individual values of lpTimeAdjustment. The function SetSystemTimeAdjustment allows to initiate and control a system time adjustment:

BOOL WINAPI SetSystemTimeAdjustment(IN DWORD dwTimeAdjustment,
IN BOOL bTimeAdjustmentDisabled);

System time adjustments occur when bTimeAdjustmentDisabled is set to FALSE and dwTimeAdjustment is set to some meaningful value. Unfortunately, the influence of the values of dwTimeAdjustment depends on the Windows version: The MSDN description of the SetSystemTimeAdjustment function contains the note: "Currently, Windows Vista and Windows 7 machines will lose any time adjustments set less than 16." Note: Windows 8 is not mentioned here, the related knowledge base article KB2537623 also does not mention Windows 8.

The update scheme of the system time and also the scheme of system time adjustments depends on the presence of a High Precision Event Timer [HPET]. Intel specifies [hpetspec.pdf]: "An existing HPET does not replace the RTC Time of Day, the RTC Alarm, and the RTC CMOS functionality. The HPET architecture supplements/replaces only the RTC Periodic Interrupt function." The RTC (Real Time Clock) Periodic Interrupt function used to be the heartbeat of the system time update. However, an existing HPET will replace this functionality and remove the system time update activity from the RTC periodic interrupt function. Those systems can typically be identified by a specific value of the update period lpTimeIncrement: 156001. HPET and RTC are driven by different hardware. Therefore they are neither synchronized nor are they in phase by default; additionally they may show specific drifts. More information about the evolution of the HPET architecture is given in "Guidelines For Providing Multimedia Timer Support" [MSDN]. Newer systems may provide hardware with an invariant Time Stamp Counter (TSC) as described in section 17.13 of "Intel 64 and IA-32 Architectures, Software Developer’s Manual". Windows has a clear preference about what hardware resource is to be used for timekeeping. When suitable TSC characteristics are obtained, Windows uses the TSC for timekeeping. If the TSC is not suitable, Windows uses the HPET when available, and if that is not available or disabled in BIOS Windows uses the ACPI PM timer ("MSDN: Acquiring high-resolution time stamps.").

It was already shown in section 2.3 of Microsecond Resolution Time Services for Windows that the Windows system timing cannot be assumed to show a fixed pattern. The evolution of Windows with newly introduced limitations (... will lose any time adjustments set less than 16.) and emerging new hardware results in a big variety of schemes for system time adjustments. A few relevant combinations are diagnosed and described here.

4.1.  Windows XP and Windows Server 2003: The Classical Case

A call to GetSystemTimeAdjustment reveals a value of 156250 for lpTimeIncrement on most platforms running Windows XP or its server variants (Some specific hardware may return other values e.g. 100144). Note: A value of 156250 represents 15.625 ms, an RTC Periodic Interrupt at 64 Hz. This is a very common hardware fingerprint.

Using the function SetSystemTimeAdjustment with dwTimeAdjustment = 156250 and bTimeAdjustmentDisabled = FALSE shall initiate a system time adjustment. However, according to the gain equation described in 4. no adjustment shall take place, the gain shall be zero, but the adjustment shall be active with lpTimeAdjustmentDisabled = FALSE.

Setting dwTimeAdjustment to any number different from lpTimeIncrement shall result in a system time adjustment. Example: lpTimeIncrement = 156250 and dwTimeAdjustment = 156257. The system time will advance by 15.6257 ms every 15.6250 ms, the system time will gain 0.0448 ms/s (7/156250). This way the gains are predictable, a small list shows the obtained gains at the neighborhood of 156250 at dwTimeAdjustment from 156255 to 156248:

156255:  0.032000 ms/s = (156255 - 156250)/156250
156254:  0.025600 ms/s = (156254 - 156250)/156250
156253:  0.019200 ms/s = (156253 - 156250)/156250
156252:  0.012800 ms/s = (156252 - 156250)/156250
156251:  0.006400 ms/s = (156251 - 156250)/156250
156250:  0.000000 ms/s = (156250 - 156250)/156250
156249: -0.006400 ms/s = (156249 - 156250)/156250
156248: -0.012800 ms/s = (156248 - 156250)/156250

These numbers are captured on true hardware. The adjustment gain is zero with dwTimeAdjustment = 156250. The smallest available adjustment on such a platform is 6.4 s/s (positive and negative).

A similar scan was carried out on hardware reporting lpTimeIncrement = 100144 (dwTimeAdjustment = 100146 to 100142):

100146:  0.01997124 ms/s = (100146 - 100144)/100144
100145:  0.00998562 ms/s = (100145 - 100144)/100144
100144:  0.00000000 ms/s = (100144 - 100144)/100144
100143: -0.00998562 ms/s = (100143 - 100144)/100144
100142: -0.01997124 ms/s = (100142 - 100144)/100144

This hardware also consistently follows the gain equation provided by the MSDN description. However, the smallest adjustment gain on this hardware is almost 10 μs/s.

Windows XP and Windows Server 2003 do not support a hardware HPET. These Windows versions may use Programmable Interrupt Timers (PIT), Real Time Clocks (RTC), the processors Time Stamp Counter (TSC), and Power Management Timer (PMTIMER) to mimic what is later done by the High Precision Event Timer (HPET). These Windows versions increment the system time at a fixed period every lpTimeIncrement. This period does not depend on settings of the timer resolution by means of the timeBeginPeriod() function. This is easiest confirmed by polling system file time transitions over a longer period of time with different settings of timeBeginPeriod(). As a result, the granularity of the system time is typically in the range of 10 ms to 20 ms.

4.2.  Windows Vista, Windows 7, Windows 8, and Windows 8.1

Windows VISTA introduced HPET support. It has been the first public Windows version decoupling the system time update and the system time adjustment from the RTC Periodic Interrupt function or the ACPI PM timer in case of existing HPET hardware. This was a big step towards higher timing accuracy. However, it also caused some inconsistency with a remarkable drawback for Windows VISTA and Windows 7 (KB2537623) persisting until now. Windows Vista also introduced the influence of the multimedia timer resolution (set by timeBeginPeriod) to the update period of the system time: The system time is updated at a period of ActualResolution returned by the function NtQueryTimerResolution.

The following list of system time gains vs. dwTimeAdjustment (156154 to 156330) was taken with Windows Vista on a platform without HPET/TSC support (lpTimeIncrement = 156250):

156154 to 156169 [16 element(s)] gain -0.5120328 ms/s.
156170 to 156185 [16 element(s)] gain -0.4096262 ms/s.
156186 to 156201 [16 element(s)] gain -0.3072197 ms/s.
156202 to 156217 [16 element(s)] gain -0.2048131 ms/s.
156218 to 156233 [16 element(s)] gain -0.1024066 ms/s.
156234 to 156250 [17 element(s)] gain +0.0000000 ms/s.
156251 to 156266 [16 element(s)] gain +0.1024066 ms/s.
156267 to 156282 [16 element(s)] gain +0.2048131 ms/s.
156283 to 156298 [16 element(s)] gain +0.3072197 ms/s.
156299 to 156314 [16 element(s)] gain +0.4096262 ms/s.
156315 to 156330 [16 element(s)] gain +0.5120328 ms/s.

This list discloses some information contained in "... will lose any time adjustments set less than 16...". It seems that it is not losing time adjustments with values less than 16, but SetSystemTimeAdjustment ignores the lower 4 bits of dwTimeAdjustment. The obtained gain is the same for all dwTimeAdjustment values in one group. The group size is 16. Only the group ranging from 156234 to 156250 has 17 members. It is yet unclear why the scheme shows this exception. However, the gain equation used for the gain calculation obviously does not apply here. Therefore, MSDN: "For each lpTimeIncrement period of time that actually passes, lpTimeAdjustment will be added to the time of day" becomes incorrect for this configuration. Exception: Gain is zero at dwTimeAdjustment = lpTimeIncrement.

The next list is taken with Windows Vista on a platform with HPET/TSC support (dwTimeAdjustment: 155908 to 156079, lpTimeIncrement = 156001):

155908 to 155922 [15 element(s)] gain -0.5000000 ms/s.
155923 to 155938 [16 element(s)] gain -0.4000000 ms/s.
155939 to 155954 [16 element(s)] gain -0.3000000 ms/s.
155955 to 155969 [15 element(s)] gain -0.2000000 ms/s.
155970 to 155985 [16 element(s)] gain -0.1000000 ms/s.
155986 to 156001 [16 element(s)] gain +0.0000000 ms/s.
156002 to 156016 [15 element(s)] gain +0.1000000 ms/s.
156017 to 156032 [16 element(s)] gain +0.2000000 ms/s.
156033 to 156047 [15 element(s)] gain +0.3000000 ms/s.
156048 to 156063 [16 element(s)] gain +0.4000000 ms/s.
156064 to 156079 [16 element(s)] gain +0.5000000 ms/s.

The periodic interrupt increments the system time and performs the system time adjustment. However, the "lost 4 bits" idea becomes questionable one more time. Groups have either 15 or 16 elements.

The minimum selectable adjustment gain appears to be coarse on Windows VISTA. TSC, HPET, or PM timer configurations show a minimum gain of approx. +/- 0.1 ms/s.

Windows 7 and Windows Server 2008 R2 introduced Timer Coalescing (more detailed: TimerCoal.docx) to "...improve the efficiency of periodic software activity by expiring multiple distinct software timers at the same time...". This portion of software shifts interrupts into groups of interrupts. A requested interrupt is accompanied by a tolerance to tell the OS by how much it is allowed to shift the interrupt in time. This may affect the update of the system time and has to be diagnosed carefully. Windows 7 does not update the system time by fixed increments.

Capturing the adjustment gain on a Windows 7 platform with constant TSC support results in the following list (dwTimeAdjustment: 155908 to 156079 lpTimeIncrement = 156001):

155908 to 155922 [15 element(s)] gain -0.5571681 ms/s.
155923 to 155938 [16 element(s)] gain -0.4571738 ms/s.
155939 to 155954 [16 element(s)] gain -0.3571796 ms/s.
155955 to 155969 [15 element(s)] gain -0.2571853 ms/s.
155970 to 155985 [16 element(s)] gain -0.1571910 ms/s.
155986 to 156001 [16 element(s)] gain -0.0571967 ms/s.
156002 to 156016 [15 element(s)] gain +0.0427976 ms/s.
156017 to 156032 [16 element(s)] gain +0.1427918 ms/s.
156033 to 156047 [15 element(s)] gain +0.2427861 ms/s.
156048 to 156063 [16 element(s)] gain +0.3427804 ms/s.
156064 to 156079 [16 element(s)] gain +0.4427747 ms/s.

Three important results can be drawn from the list above:

This behavior raises the question of whether a specific gain for a specific value of dwTimeAdjustment remains constant over time. Careful evaluation of this matter has not confirmed any variation of the gain (added advancement of the system time) when a constant value of dwTimeAdjustment is applied. Therefore, it remains difficult to predict the adjustment gain for values of dwTimeAdjustment for systems affected by this scheme (Windows Vista and Windows 7 with HPET/TSC support). "For each lpTimeIncrement period of time that actually passes, dwTimeAdjustment will be added to the time of day." In this regard, [MSDN]'s claim turns out to be wrong on Windows 7 too. Note: This specific asymmetry occurs with the systems interrupt period set the minimum by means of e.g. timeBeginPeriod(wPeriodMin).

All software packages using SetSystemTimeAdjustment are in serious danger of relying on predictable gains. It should also be noted that there is no dwTimeAdjustment setting for a gain of 0.0 ms/s. It was shown in section 4.1, it was shown that earlier versions of Windows had a much more predictable scheme. The scheme observed on Windows VISTA and Windows 7 requires the software to calibrate itself to the appropriate gain for values of dwTimeAdjustment because it cannot be easily evaluated by the given values of lpTimeIncrement and lpTimeAdjustment.

The system time synchronization routines of these newer Windows versions do not seem to take these facts into account. A typical synchronization to an internet time server uses all bits for setting the values of dwTimeAdjustment. This can be easily monitored through frequent use of GetSystemTimeAdjustment. Furthermore, these tools expect the lower 4 bits to be taken into account by the system. Windows calculates a correction scheme ahead of the actual adjustment based on the offset to the network time. Unfortunately, the gains are not set as expected and the predicted scheme messes up the adjustment/synchronization, which results in the synchronization being completely off. This is accompanied by the fact that there is no monitoring of the internet time provider while the system time adjustment progresses. Such an adjustment can run for hours and a big deviation may appear with wrong gain estimates resulting from the synchronization algorithm. Finally, at some point the deviation will be several seconds and the next synchronization will only set the local time to the network time without applying the function SetSystemTimeAdjustment.

Windows 8 has finally fixed this mishap. This list has been captured on a Windows 8 system with constant TSC support:

155995 to 155995 [1 element(s)] produces gain -0.0377952 ms/s.
155996 to 155996 [1 element(s)] produces gain -0.0316960 ms/s.
155997 to 155997 [1 element(s)] produces gain -0.0255968 ms/s.
155998 to 155998 [1 element(s)] produces gain -0.0185976 ms/s.
155999 to 155999 [1 element(s)] produces gain -0.0127984 ms/s.
156000 to 156000 [1 element(s)] produces gain -0.0062992 ms/s.
156001 to 156001 [1 element(s)] produces gain +0.0003000 ms/s.
156002 to 156002 [1 element(s)] produces gain +0.0073991 ms/s.
156003 to 156003 [1 element(s)] produces gain +0.0130983 ms/s.
156004 to 156004 [1 element(s)] produces gain +0.0200975 ms/s.
156005 to 156005 [1 element(s)] produces gain +0.0257967 ms/s.
156006 to 156006 [1 element(s)] produces gain +0.0327959 ms/s.
156007 to 156007 [1 element(s)] produces gain +0.0389951 ms/s.

The missing resolution for the value of dwTimeAdjustment is gone, each value has its own gain and the gain is close to the predicted gain (Example: 156003: (156003 - 156001)/156001 = 0.0128 ms/s). The deviation of gains shown in this list are a result of the changes in Windows 8 timekeeping. Windows 8 does not increment the system time by constant increments, it rather applies a variety of increments to achieve a desired mean increment. As a consequence, the above measurement would have to be taken over many more periods to show results with less deviations. However, it is very obvious that the described adjustment scheme if fulfilled with Windows 8.

As of Windows 8.1, timekeeping has again undergone some modifications. The same hardware now reports 156250 for lpTimeIncrement. The list of gains appears as follows:

156244 to 156244 [1 element(s)] produces gain -0.0382037 ms/s.
156245 to 156245 [1 element(s)] produces gain -0.0316040 ms/s.
156246 to 156246 [1 element(s)] produces gain -0.0259043 ms/s.
156247 to 156247 [1 element(s)] produces gain -0.0184048 ms/s.
156248 to 156248 [1 element(s)] produces gain -0.0124051 ms/s.
156249 to 156249 [1 element(s)] produces gain -0.0066054 ms/s.
156250 to 156250 [1 element(s)] produces gain +0.0004942 ms/s.
156251 to 156251 [1 element(s)] produces gain +0.0060939 ms/s.
156252 to 156252 [1 element(s)] produces gain +0.0135934 ms/s.
156253 to 156253 [1 element(s)] produces gain +0.0188931 ms/s.
156254 to 156254 [1 element(s)] produces gain +0.0253928 ms/s.
156255 to 156255 [1 element(s)] produces gain +0.0324924 ms/s.
156256 to 156256 [1 element(s)] produces gain +0.0385920 ms/s.

Windows 8.1 has finally returned to the original Windows heartbeat of 64 Hz (1/15.625 ms). Each value of dwTimeAdjustment produces an individual gain and the result follows the documentation.

The system time adjustment will take care that the system time will progress by TimeAdjustment during TimeIncrement. This effectively happened with Windows XP. Since Windows 8 (on specific hardware also since Windows 7) this process may also appear as a progress in smaller steps, depending on the setting of the timer resolution. When the timer resolution is set to maximum resolution (see section 2.1. of Microsecond Resolution Timer Services for Windows), the obtained increments are in the same order of magnitude as the timer resolution. However, Windows 8 and Windows 8.1 maintain the average progress of TimeAdjustment during TimeIncrement.

Additional information:

MSDN: "The W32Time service cannot reliably maintain sync time to the range of 1 to 2 seconds. Such tolerances are outside the design specification of the W32Time service." [Support boundary to configure the Windows Time service for high accuracy environments]

MSDN: "If the time difference between the local clock and the selected accurate time sample (also called the time skew) is too large to correct by adjusting the local clock rate, the time service sets the local clock to the correct time." [How the Windows Time Service Works]

4.3.  Monitoring an NTP time provider

A much more detailed view of the system time adjustment can be obtained when the local time is compared to a precise remote time while the system time adjustment is active. The accuracy of w32tm.exe is simply too poor to extract meaningful results. Also, the accuracy of time.windows.com is unsatisfactory.

In order to facilitate a closer look at the problems described above, an NTP (Network Time Protocol) client was added to the time services and the user interface was extended by an NTP Offset tab. This allows to see how the local time progresses against a reference time.

The calibrated performance counter frequency receives an additional correction when a system time adjustment is active. The system time adjustment forces the local time to advance slower or faster, thus the performance counter frequency has to be corrected in a way that takes the modified duration of the "second" during the adjustment into account (see section 2.1.3. of Microsecond Resolution Time Services for Windows). Consequently, an applied system time adjustment becomes visible in the "Calibrated Performance Counter Frequency Offset" tab. As of version 1.2, the calibrated performance counter frequency offset is given in ppm. It is referenced to the value given by QueryPerformanceFrequency() and scaled to show deviation in parts per million. This corresponds to s/s. This way applied system time adjustment gains will directly show in the plot with real numbers.

The user interface now also provides two checkboxes. When the NTP checkbox is checked, NTP monitoring is activated. The "Autoadjust" checkbox enables permanent synchronization of the local time to a network time:

Fig_4_3_1_GUI 1_2.png
Fig. 4.3.1: GUI V1.2 with NTP Offset tab, NTP and Autoadjust checkboxes, and NTP/Autoadjust status lines.

The NTP status and the current offset to the network time are reported at the bottom in the NTP status line. Another status line contains information about the automatic adjustment (see section 4.4 for more information on automatic adjustment).

The following two plots were captured when the a system time adjustment was triggered by Windows XP:

Fig_4_3_2_XP_good_sync_cpcf.png
Fig. 4.3.2: System time adjustment mapped to performance counter frequency (Windows XP).

Fig_4_3_3_XP_good_sync_ntp.png
Fig. 4.3.3: NTP Offset during the adjustment (Windows XP).

Fig. 4.3.2 shows that the performance counter frequency offset jumps to about 140 ppm. This corresponds to an initial adjustment gain of 120 s/s because the initial offset was already 20 ppm. The gain was reduced in steps over a long period of time (the total adjustment lasted from 8:46 to around 16:00). In the first part, the gain was reduced after about the same time until about 11:33. At that point, the granularity of dwTimeAdjustment prohibited smaller steps and the time between the modifications of dwTimeAdjustment was extended. This way, the target could be approached with a decreasing adjustment speed. The last step from about 13:50 represents the dwTimeAdjustment = 156250. The system time adjustment was still enabled, however the gain was 0.0 ms/s. At this point, the system drifted with its own drift rate.

Typical drifts of local time are in the area of a few s/s. However, the smallest gain obtainable on Windows XP is 1/156250 = 6.4 s. In practice, the drift may be higher than the smallest gain setting. This way, a final adjustment step may not move in the desired direction. This can be seen in Fig. 4.3.3. As mentioned, the whole scheme of how and when the various gain settings are applied is worked out ahead of the actual adjustment; however, the local drift can add a considerable offset when the adjustment takes many hours.

As described in 4.2, a lot can fail during an adjustment on newer Windows versions. The following plot was recorded during an adjustment on Windows 7:

Fig_4_3_4_system_time_adjustment_on_W7_endless.png
Fig. 4.3.4: Calibrated performance counter frequency during a system time adjustment (Windows 7).

The initial offset is about -40 ppm. The jump to 540 ppm indicates an initial gain of about 580 ppm or s/s. Due to poor resolution (granularity of gain), the sign of the adjustment gain changes after just 2 steps and remains there for a long time (at least for another day). This is a typical example of a failing system time adjustment on a Windows 7 system. The offset time is basically the sum of the adjustments and is completely messed up (large negative offset) during this attempt.

Windows 8 has fixed the limited resolution of dwTimeAdjustment and shows adjustments comparable to Windows XP. The following two plots show a system time adjustment initiated by the Windows 8 internet time GUI:

Fig. 4.3.5: Calibrated performance counter frequency during a system time adjustment (Windows 8).
Fig. 4.3.5: Calibrated performance counter frequency during a system time adjustment (Windows 8).

Fig. 4.3.6: NTP offset during a system time adjustment (Windows 8).
Fig. 4.3.6: NTP offset during a system time adjustment (Windows 8).

NTP monitoring was enabled at 10:15:15. From this point in time no adjustment was active, the system drifted at about 14.4 s/s until 10:33:13 when the NTP offset reached 0.5 s (500 ms) and the system time adjustment was enabled. The procedure was performed by Windows in 11 steps, starting with dwTimeAdjustment = 156014:

156014 from 10:33:13 to 10:50:17, duration: 1024 s, gain = +83.333 s/s,
gained +85.333 ms, remaining offset: +414.7 ms
 
156012 from 10:50:17 to 11:07:21, duration: 1024 s, gain = +70.512 s/s,
gained +72.204 ms, remaining offset: +342.5 ms
 
156010 from 11:07:21 to 11:24:25, duration: 1024 s, gain = +57.692 s/s,
gained +59.077 ms, remaining offset: +283.4 ms
 
156008 from 11:24:25 to 11:41:30, duration: 1025 s, gain = +44.872 s/s,
gained +45.994 ms, remaining offset: +237.4 ms
 
156007 from 11:41:30 to 11:58:34, duration: 1024 s, gain = +38.461 s/s,
gained +39.384 ms, remaining offset: +198.0 ms
 
156006 from 11:58:34 to 12:15:37, duration: 1023 s, gain = +32.051 s/s,
gained +32.788 ms, remaining offset: +165.2 ms
 
156005 from 12:15:37 to 12:32:41, duration: 1024 s, gain = +25.641 s/s,
gained +26.256 ms, remaining offset: +138.9 ms
 
156004 from 12:32:41 to 13:06:49, duration: 2048 s, gain = +19.231 s/s,
gained +39.385 ms, remaining offset: +99.5 ms
 
156003 from 13:06:49 to 13:58:02, duration: 3072 s, gain = +12.820 s/s,
gained +39.383 ms, remaining offset: +60.1 ms
 
156002 from 13:58:02 to 15:06:17, duration: 4095 s, gain = +6.410 s/s,
gained +26.249 ms, remaining offset: +33.8 ms
 
156001 from 15:06:17 to ?

The list shows the progress of the adjustment for each setting of dwTimeAdjustment followed by the period of time during which dwTimeAdjustment was active. The gain was calculated using the expression given in 4. Consequently, the adjustment contribution and the remaining offset was calculated. The adjustment scheme looks identical to the scheme observed on Windows XP. Presumably no changes have been made to the systems adjustment tool. However some more details can be extracted from the list above:

The observed offset at the end of the active adjustment was approx. 270 ms. The total adjustment time was 16386 s (10:33:11 to 15:06:17, 16 x 1024 s). The systems drift was 14.4 s/s. At a drift rate of 14.4 s/s the system drifted by 235.36 ms over the 16386 seconds. The difference to the observed offset of 270 ms is 34.64 ms. This corresponds to the remaining offset derived from the adjustment progress table.

This evidently shows that Windows calculates an adjustment scheme based on a one-time offset measurement ahead of the actual adjustment. Unfortunately the scheme captured here does allow for a remarkable remaining offset. The drift is not taken into account at any time. This way an adjustment, like the adjustment shown here, may take several hours to adjust the offset into the few milliseconds regime and just about the same time to be where the offset was prior to the attempt to adjust.

Larger offsets are not adjusted using such a scheme. An offset of say 10 seconds is simply corrected by setting the system time in one shot. This produces a jump in time which may be confusing to software, particularly when the jump in time is backwards.

4.4.  Synchronizing to an NTP time provider

Windows broadcasts a WM_TIMECHANGE message to all top level windows when a system time change occurs. This can be used to detect changes of system time but it requires a window. However, there is no notification when the system time is adjusted. As a result, the system time changes gradually without any notification other than the actual changes in the flow of time. The only way to check this is through a frequent call to GetSystemTimeAdjustment. This is an obvious drawback. The state of such asynchronous behavior can only be closely estimated by calling GetSystemTimeAdjustment frequently.

Time control with high accuracy, as proposed by the Windows Timestamp Project, cannot accept the uncertainties and inaccuracies described here. The proposed solution is continuous synchronization of the system time to a network time using NTP. This automatic adjustment can be enabled by checking the "Autoadjust" checkbox of the GUI (Fig. 4.3.1). Synchronizations of the local time may still occur asynchronously when scheduled by the operating system; however, the service described here is capable of detecting and canceling them. Nevertheless, disabling the automatic synchronization provided by Windows (see the Windows GUI in section 1) is recommended in order to obtain the greatest accuracy.

The following graph shows a Windows 8 system:

Fig. 4.4.1: Drift and autoadjust on a Windows 8 system.
Fig. 4.4.1: Drift and autoadjust on a Windows 8 system..

NTP monitoring was started at around 18:16 and the local time drifted at a rate of about -14.2 s/s. The NTP offset increased from around 0.0005 s to around 0.015 s within the next 19 min (green plot line). At about 18:35, the autoadjust was enabled and the local time was synchronized to the network time.

The effect of the system time adjustment on the performance counter frequency has been described in section 4.3. The plot of the calibrated performance counter offset for the adjustment shown in Fig. 4.4.1 is given below:

Fig. 4.4.2: Adjustment steps on a Windows 8 system.
Fig. 4.4.2: Adjustment steps on a Windows 8 system.

Fig. 4.4.1 shows that the network time is running faster and the local time loses about 14 s/s. Positive gains are required to catch up with the network time. The time service started by applying the smallest positive gain with dwTimeAdjustment = 156002. This resulted in a gain of 0.0064 ms/s. Afterwards, the value of dwTimeAdjustment was incremented periodically. At a value of 156051, the gain increased to 0.3205 ms/s. The dwTimeAdjustment was decremented periodically after half of the desired offset was adjusted. A positive gain causes the system time to progress faster; the calibrated performance counter frequency consequently gets lowered with positive gains. As already mentioned, the calibrated performance counter offset is normalized to the performance counter frequency given by the system to show ppm. As a result the plot effectively shows negated gain values (e.g. a gain of +18.2 s/s will show as -18.2 ppm).

The continuous adjustment results in a mean offset of the network time to local time in the range of a few 100 microseconds. However, this may be affected by network bandwidth and/or NTP server quality. The network time server pool used here is pool.ntp.org (it is highly recommended that the information provided on this site be read). The accuracy of this server significantly outperforms the accuracy of time.windows.com. The available bandwidth is essential for very high accuracy. Heavy traffic on the network connection may temporarily drop the level of accuracy to within a few milliseconds.

The next graph shows a continuous adjustment interrupted by a three minute drift phase in between to highlight the narrow band in which the NTP offset is held during the adjustment:

Fig_4_4_3_W7_continuous_adjustment_interrupted_by_a three_minutes_drift_phase .png
Fig. 4.4.3: Windows 7: Continuous adjustment interrupted by a three minutes drift phase.

This figure was taken as a screenshot of the GUI to show the estimated local drift. This local drift can be estimated from the mean of the applied gains after a few minutes of continuous operation of "autoadjust". Its value appears in the "all output" tab and at the end of the NTP status line when available.

The quality of adjustment becomes visible when the network time offset drifts. In just three minutes, the offset drifted to about 2.7 ms. If high accuracy is required, it is not only necessary to synchronize the local time to a network time periodically; it is essential to synchronize it continuously.

4.5.  Conclusions

Windows synchronization to a network time reference has proved to not be very accurate. In particular, Windows versions VISTA and 7 seem to have lost some of the capabilities for some unknown reason. Unfortunately, there is not much information on this issue and the little information available basically says that Windows time synchronization should not be expected to be more accurate than a few seconds and that there may be a mishap in the behavior of SetSystemTimeAdjustment with respect to the meaning of the value of dwTimeAdjustment. Only Windows 8 has now overcome these drawbacks and its system time adjustment performs like it did on Windows XP.

Unfortunately, there are still many NTP synchronization packages around which operate under the assumption of the current MSDN description that "For each lpTimeIncrement period of time that actually passes, lpTimeAdjustment will be added to the time of day". Evidently, this assumption is not true for Windows VISTA and Windows 7. These versions need software that is capable of dealing with the artifacts described here to set the system time correctly to obtain good accuracy.

Offsets of system time may drift seconds per day. Even on systems with a low drift rate the drift can easily reach half a second per day. This can only be overcome by a correction of the systems knowledge of its clock frequency. Newer Windows versions calibrate the performance counter frequency (result of QueryPerformanceFrequency) at boot time when operating with TSC and/or HPET. This was initially done by Windows 7 and has improved with Windows 8. But there does not seem to be an on the fly correction of this value while a network time synchronization occurs. This is basically the reason for the noticeable drift and the need for a continuous adjustment. Windows 8.1 has not shown any improvements with respect to the "build in" system time adjustment.


A pdf version of "Part II: Adjustment of System Time" can be downloaded here.