Analyzing a Slow Exchange 2003 Server

Originally published on August 16, 2006 by Dirk Paessler
Last updated on January 23, 2024 • 7 minute read

You have been there: All servers seem to be getting slower over time. Always. But is it really the problem? Does it really hurt your business? And what can you do against it? For Windows servers there are multiple reasons for a slowdown over time:

fragmented disks
overflowing TEMP folders
processes that eat more and more RAM
too many processes on a system or cpu-intensive processes
hardware problems
faulty software

Most of theses issues can be felt when working directly on the system (e.g. using Remote Desktop), but maybe they do not have an impact on the server services they provide.

Finding Out Whether a Server REALLY Gets Slower

Only by using long-term monitoring of the server's services you can really tell if there is a slowdown. If we can't really find a slowdown then looking further into possible reasons may be a waste of time. Let's take a sample server that runs IIS, DNS server and Exchange server. We would suggest to monitor the following items using IPCheck Server Monitor:

PING
DNS Requests
HTTP sensors for IIS websites
HTTP sensors for the Outlook Web Access site
SMTP sensor for Exchange Serverâ??s mailserver
POP3 sensor for Exchange Serverâ??s mailserver
SNMP Traffic Sensor to track bandwidth usage
SNMP Advanced sensor for a CPU Load
SNMP Advanced sensor for System Context Switches

When monitoring these parameters over a longer time (e.g. several weeks) you will notice that most services actually do not degrade in speed at all even though it may feel like that in Remote Desktop. One aspect is psychology: we are used to ever faster systems, but especially servers are not replaced as often as workstations (ahmm, this server seemed to be faster a while ago). Another aspect is that Windows servers apply a much better caching and memory management for the background processes than for the user interface, which is a good idea obviously. And finally most server processes are programmed to use extensive caching in order to fulfill most requests from their memory cache to minimize slow disk access.

Let's Look at A Sample

But you may discover that a service really got slower in the last weeks! Let's have a look at a sample graph:

This graph shows the request times for the homepage of Outlook Web Access, the web interface of Exchange Server 2003. You can see that until March 2006 the service had weekly peak times (which occur during the backup of the server). But starting in March 2006 answering times jumped to 3-4 times of the long term averages and in May 2006 the situation got even worse. This called for our administrators to look into the problem:

Looking for Cause of The Problem

We looked at all the sensors mentioned above and did not find any good clue what was going wrong. The OWA sensor was the only sensor that had slowed down, everything else (CPU load, bandwidth usage, IIS, DNS server) was looking normal. So we could conclude that all these parameters will not help to find the problem. So we took the following steps to find the cause of the problem:

First we rebooted the server to solve any processes that may run wild (the server had not been booted for 9 weeks).
Using Windows' TaskManager and Process Explorer from Sysinternals we tried to find whether processes use too much memory or CPU cycles
We cleaned out TEMP folders and unnecessary files on the disks
We applied all the latest updates for the OS and for Exchange server
Using a defragger (OO Defrag Server) we defragmented the disks
We increased the size of the paging file to have more virtual memory

None of these actions changed the slow behaviour of the Outlook Web Access server. Now the users inside the LAN that use Outlook (that directly talks to the Exchange server) also complained about slow systems.

Gotcha: Luck Helped to Solve the Problem

Finally we got closer to the problem with a little luck: The Exchange Server used 5 different mailbox stores to store emails. When we dismounted each one of them for some minutes during our search suddenly the server became responsive again. We looked into that specific mailbox store and found seven email accounts. Step by step we moved each account to another mailbox store and let the machine run for some minutes. This way we finally found the mailbox that caused the overload. It was the only mailbox on the system that was accessed via IMAP from a Mac mail client. As soon as this mail client was switched to POP3 access instead of IMAP our server was back to normal. Obviously the IMAP implementation of the mail client somehow caused the Exchange Server to run wild.

Conclusion

This story offers two lessons to be learned:

Don't rely on your intuition whether a system gets slower. Monitor!
Long term performance data can help you to find the cause of problems
Sometimes finding a server's hidden problem needs a little luck. Keep Searching!

All about PRTG