As my company attempts to deploy xenserver; I was tasked with setting up a new resource pool for one of the business units to test their build system in a virtualized environment.

After establishing several Redhat 4.8 VMs, the requester visited me with a concern of the system times going wonky. The build processing would fail as licensed software would stop functioning due to clock times.

A few date commands showed all VMs were increasing in time.   I checked the timeservers on our internal networks and the times on the hosts which were correct.   Only the VMs were having this problem.

Of course, there were no error messages to suggest the cause of the problem. Google showed a few posts which showed this was an old problem.

If this was a bug for 4.8 then we should see it everywhere. A quick check of the bare metal 4.8 servers showed system time had no issues. Problem only happens on VMs.

One of the solutions I found was to play with the Kernel boot options.

clock=pmtmr divider=10

I attempted the above change but the time kept racing forward.

We opened a trouble ticket with Redhat and they advised the option I used was  meant for a 32 bit operating system. For a fully virtualized 64 bit RHEL 4.0; we needed to use:

notsc divider=10

They also warned though this would eliminate extreme drift; there was a chance for slight drift. Option was attempted but time raced forward.

Redhat then suggested we use the xen kernel and the above option but again time raced forward.

At this point Redhat said it was a technical limitation in RHEL 4 and it will not be fixed. I did not accept this answer since there wasn’t an official statement or tech note. Basically somebody didn’t want to work and hoped I would go away.

Of course I took this as a challenge since the question remained why would time was correct on a bare metal system and not a virtualized system?

I decided to play with a few options:

notsc divider=10 lpj=n

Failed.

clock=pmtmr notsc divider=10

Failed.

notsc divider=10

Failed.

At this point I wondered if the problem was truly related to NTP. I disabled the NTP daemon and rebooted. The clock still raced forward!

I started looking at other options on the system. The hosts are HP DL 360 G7s. I looked at the cpu options through xenserver and decided to play with a few but nothing addressed the problem.

While searching the Net; I happened to stumble on a bug report over high precession timers.

I thought “why not?” and I tried the kernel option “nohpet” to disable the high precession timer and voila! time stopped racing forward!

The final option ended up being

nohpet notsc divider=10

The business unit ran their tests without any issues.

Lesson of the day: Answers can come from issues not related to the problem.

Advertisements