So before I can show you what I am doing to fix our scom environment, I need to tell you a little bit about what we did wrong, and what issues we are seeing because of it.
SCOM uses Management Packs for everything it does.
Management Packs (MP) contain predefined monitoring settings that enable agents to monitor a specific service or application in Operations Manager 2007. These predefined settings include discovery information that allows management servers to automatically detect and begin monitoring objects, a knowledge base that contains error and troubleshooting information, alerts, and reports.
In short MP's are a collections of rules and alerts for a specific object. Many MP's are provided from Microsoft for monitoring things like Windows 2008 Server OS, or SQl. Some third party vendors provide management packs as well. For example we have a lot of Dell Servers in our environment, so we installed the dell server MP. This MP monitors physical system health and alerts us when a failure has occurred.
When you First install SCOM it will install around 40 different default MPs. These MP's are a core part of SCOM. You then have the option to install additional MPs. These will be application specific, Windows, SQL, Exchange, ect. It is a best practice that you only install MPs that you need, and install them in a controlled manner, so they can be configured, and overridden in a controlled manner.
In my environment we have 140 MP's installed, with very minimal overrides configured. We have anything with a critical status to generate an email to the System Admins. This creates a large amount of garbage emails to be generated.
The Second big issue we have is Overrides. It is best practice to never place an override in the default management pack (by default it will automatically select to go there.) Instead each management pack should have its own custom build override pack. For example if the MP is "Active Directory Server 2008 (Monitoring)" you would create a custom override MP "Active Directory Server 2008 (Monitoring) - Override." Like most items in your environment, naming conventions are key here. If you dont come up with something that you follow for naming custom MP's you will end up with a mess.
In my environment the overrides we do have defined are all over the place, some are in "Default Management Pack" some "Test" or "Test MP" and very little follow any naming scheme.
Unfortunately before i can fix the email flooding issue, I needed to fix the Poor Location and Naming Scheme for our custom MP's.