Sunday, October 3, 2010

System Administrator Responsibility




Daily Monitoring Tasks

Table 1.5 Daily Tasks and Their Importance

Tasks

Importance

Verify that all domain controllers are communicating with the central monitoring console or collector.

Communication failure between the domain controller and the monitoring infrastructure prevents you from receiving alerts so you can examine and resolve them.

View and examine all new alerts on each domain controller, resolving them in a timely fashion.

This precaution helps you avoid service outages.

Resolve alerts indicating the following services are not running: FRS, Net Logon, KDC, W32Time, ISMSERV. MOM reports these as Active Directory Essential Services.

Active Directory depends on these services. They must be running on every domain controller.

Resolve alerts indicating SYSVOL is not shared.

Active Directory cannot apply Group Policy unless SYSVOL is shared.

Resolve alerts indicating that the domain controller is not advertising itself.

Domain controllers must register DNS records to be able to respond to LDAP and other service requests.

Resolve alerts indicating time synchronization problems.

The Kerberos authentication protocol requires that time be synchronized between all domain controllers and clients that use it.

Resolve all other alerts in order of severity. If alerts are given error, warning, and information status similar to the event log, resolve alerts marked error first.

The highest priority alerts indicate the most serious risk to your service level..

 

Weekly Monitoring Tasks

Table 1.6 Weekly Tasks and Their Importance

Tasks

Importance

Review the Time Synchronization Report to detect intermittent problems and resolve time-related alerts.

The Kerberos authentication protocol requires that time be synchronized between all domain controllers and clients that use it.

Review the Authentication Report to help resolve problems generated by computer accounts with expired passwords.

Expired passwords must be reset to allow the computers to authenticate and participate in the domain.

Review the Duplicate Service Principal Name Report to list all security principals that have a service principal name conflict.

User or computer accounts cannot be authenticated or log on if they share an SPN with another account.

Review a report of the top alerts generated by the Active Directory monitoring indicators and resolve those items that occur most frequently.

Report shows alerts that occur most often. Focusing on the top alert generators significantly reduces the number of alerts seen by the operator.

Review the report that lists all trust relationships in the forest and check for obsolete, unintended, or broken trusts.

Authentication between domains or forests requires trust relationships.

 

 

 

Monthly Monitoring Tasks

Table 1.7 Monthly Tasks and Their Importance

Tasks

Importance

Verify that all domain controllers are running with the same service pack and hot fix patches.

Potential issues can arise if distributed services are running with different versions of software.

Review all Active Directory reports and adjust thresholds as needed. Examine each report and determine which reports, data, and alerts are important for your environment and service level agreement.

Examining the data that is relevant to your environment allows you to determine the thresholds that trigger the alerts to your service level delivery.

Review the Replication Monitoring Report to verify that replication throughout the forest occurs within acceptable limits

Timely replication helps assure that you meet your service level agreements.

Review the Active Directory response time reports.

Services must respond quickly for the system to function properly and applications such as e-mail to work properly.

Review the domain controller disk space reports.

The drives containing the Active Directory database and log files must have sufficient free space to accommodate growth and routine processing.

Review all performance-related reports. These reports are called Health Monitoring reports in MOM.

These reports can help you determine the baseline for your environment and adjust thresholds.

Review all performance-related reports for capacity planning purposes to ensure that you have enough capacity for current and expected growth. These reports are called Health Monitoring reports in MOM.

These reports help you track growth trends in your environment and plan for future hardware and software needs.

Adjust performance counter thresholds or disable rules that are not applicable to your environment or that generate irrelevant alerts.

Monitoring indicators must be adjusted to suit your environment. The goal is to provide alerts that are concise, highly relevant, and lead an operator to resolve the problem.

 

 

 

 

 

 

 

 

 

Daily Routine Tasks

 

The following are examples of daily routine tasks in the Exchange environment:

Perform backups. Each day, perform necessary system and application backups to tape. Perform backup of Exchange Server 2003 information stores. The backup content and schedule should be driven by identified risks and their respective contingencies. Combine archive-to-disk methods with tape backups as necessary to back up particular data sets or logs to tape. Perform full backup at least once a week. Configure automatic notification of backup success or failure.

Verify backups. Verify that the necessary system and application backups have taken place without critical errors or failures. As part of risk analysis and contingency planning, there might be a requirement for spare servers on which the data in backups can be verified. Exchange Server 2003 database logs should have been automatically deleted after successful backup. In the event that errors or failures have occurred, take the following actions:

Consult the backup operations guide.

Review each error to understand its impact to the backup process. This is especially true if backup quality or integrity has been affected

If any portions of the backup have failed or any of the errors are significant, log a ticket with the help desk. Restart that portion of the backup immediately, or reschedule it to take place during the most appropriate time period. This minimizes performance impacts and service disruption.

If the ticket is resolved, be sure that the solution is documented in the help desk system and that the ticket is closed.

Verify directory service availability. Verify that directory replication for both Active Directory and the File Replication service (FRS) is functioning correctly (without errors), on schedule between domain controllers in all locations. Ensure that the Update Sequence Numbers (USNs) are correct on all domain controllers. (This determines whether it is functioning "correctly"). Warnings and errors can be viewed in Directory Service and FRS logs on all domain controllers. Tools such as Dcdiag.exe, Repadmin.exe, and Replmon.exe can also be used to monitor the real-time status and performance of replication.

Check available disk space. Check disk space on the servers and SAN to ensure that sufficient "free space" exists. Tools such as System Monitor, MOM or third-party monitoring tools can be used to monitor available disk space. A more detailed check on a partition or application level can be performed; use automation to ease administrative effort. A process for long-term data archiving storage should be evaluated.

Verify successful completion of database maintenance. Use Event Viewer in Windows to verify whether Exchange Server 2003 online database defragmentation was successful.

Check SMTP queue. Use Exchange System Manager to ensure that the Exchange Server 2003 SMTP queue connection state is "ready or "active" and that queues are not "blocked," or becoming full, waiting for routing or directory lookup information. Configure Service Monitoring to automatically notify the administrator when the queue exceeds a set threshold.

Verify availability of Exchange Server 2003 stores. Use Event Viewer or MOM to verify that all mailbox and public folder stores are available and functioning normally. Set automatic notification to the Exchange administrator, if the Exchange Server 2003 store becomes unavailable.

Verify Exchange Server 2003 availability. Make sure all computers running Exchange Server in the entire Exchange organization are available. Server performance counters, CPU, disk I/O, network I/O, physical and virtual memory should not exceed the threshold. Use Exchange System Manager, Event Viewer, Computer Management Console, or MOM to configure automatic notification in case Exchange Server 2003 becomes unavailable.

Verify Exchange connectors. Make sure all Exchange Server 2003 connectors are available and functioning. Use Exchange System Manager to validate the status of the connectors.

Verify network connectivity and services. Verify that network connectivity (LAN, WAN, Internet) is functional and that network services are available (DNS, DHCP, proxy). Use Ping, Ipconfig, Tracert, System Monitor, and/or MOM to check and monitor.

Review open service tickets. Review each Exchange Server 2003 "open" help desk ticket. The ticket owner should follow up on resolving the issue within the established SLA. Document the resolution, or escalate as appropriate.

Review Windows Systems and Application logs. Check Windows Server 2003 System, Security, and Application logs on all computers running Exchange Server. On domain controllers, also check Directory Services and File Replication Service logs. Document and investigate all errors and warnings. Use automatic notification when possible to minimize administrative effort.

 


No comments:

Post a Comment