Thursday, 9 June 2011

Part 1: Monitoring Windows Servers Agentlessly via WMI using NAGIOS on Ubuntu Server

This how to will show how to install the open source software NAGIOS to monitor network devices such as switches, routers, servers, firewalls and UPS, and alert if they have any problems.

The base installation will use Ubuntu Server 11.04 (latest at time of writing) - due to compatibility with the VMware CLI as detailed later.

For the initial install, use Ubuntu Server 11.04 on a Virtual Machine (VM) with at least 512MB of RAM and an 8GB virtual disk, which can be thin provisioned.

After selecting your language, press F4 to select installing a 'Minimal Virtual Machine' if using VMware. Once setup is complete, install VMware tools.

Install NAGIOS according to this guide, using the default options however skip Step 9.
Note: If you plan to monitor 2008+ servers memory or pagefile, you will require the second (text based) version of this plugin with 'checkmem08' included. Don't forget to change owner (chown) to nagios:nagios and mark the file as executable (+x).

A big thanks and all credit for being able to monitor Windows via WMI from NAGIOS to Matthieu Thibault!

First, install the check_wmi plugin according to Matthieu's blog:

Second, if you are using a Microsoft Active Directory domain -
1. Create a new Group called "No Access"
2. Create a new User called "nagios_svc" and set a secure, non-expiring password
3. Make nagios_svc a member of "No Access" group, and remove from Domain Users...for security reasons
NOTE: To monitor an Active Directory domain controller, the user must be a member of the Domain 'Administrators' group, consider the security consequences first!

On the Windows server you wish to monitor, make this user a member of the "Local Administrators" group.

On the NAGIOS monitoring server -
Edit: /usr/local/nagios/etc/nagios.cfg
Uncomment: cfg_file=/usr/local/nagios/etc/objects/windows.cfg

Edit:/usr/local/nagios/etc/objects/commands.cfg and add the following lines:
#Check Windows Drivesize
define command{
        command_name    wmi_drv
        command_line    /usr/local/nagios/libexec/check_wmi -H $HOSTADDRESS$ -u YOURDOMAIN/nagios_svc -p <Password> -m checkdrivesize -a $ARG1$ -w $ARG2$ -c $ARG3$
        }
#Check Windows CPU
define command{
        command_name    wmi_cpu
        command_line    /usr/local/nagios/libexec/check_wmi -H $HOSTADDRESS$ -u YOURDOMAIN/nagios_svc -p <Password> -m checkcpu -w $ARG1$ -c $ARG2$
        }
#Check Windows Memory <= 2003
define command{
        command_name    wmi_mem
        command_line    /usr/local/nagios/libexec/check_wmi -H $HOSTADDRESS$ -u YOURDOMAIN/nagios_svc -p <Password> -m checkmem -a $ARG1$ -w $ARG2$ -c $ARG3$
        }
#Check Windows Memory => 2008
define command{
        command_name    wmi_mem08
        command_line    /usr/local/nagios/libexec/check_wmi -H $HOSTADDRESS$ -u YOURDOMAIN/nagios_svc -p <Password> -m checkmem08 -a $ARG1$ -w $ARG2$ -c $ARG3$
        }

#Check Windows Eventlog
define command{
        command_name    wmi_eventlog
        command_line    /usr/local/nagios/libexec/check_wmi -H $HOSTADDRESS$ -u YOURDOMAIN/nagios_svc -p <Password> -m checkeventlog -a $ARG1$,$ARG2$,$ARG3$
        }
#Check Windows Services
define command{
        command_name    wmi_service
        command_line    /usr/local/nagios/libexec/check_wmi -H $HOSTADDRESS$ -u YOURDOMAIN/nagios_svc -p <Password> -m checkservice -a $ARG1$
        }

Edit the file /usr/local/nagios/etc/objects/windows.cfg by deleting default contents and replacing with:
#Windows Test Host 2008 R2
define host{
        use             windows-server  ; Inherit default values from a template
        host_name       Windows-Test-Host-A-2008R2        ; The name we're giving to this host
        alias           Test 2008 R2 Windows Server A      ; A longer name associated with the host
        address         10.3.11.8      ; IP address of the host
        }
define service{
        use                     generic-service
        host_name       Windows-Test-Host-A-2008R2
        service_description     WinMemory08R2
        check_command           wmi_mem08!physical!80%!90%
        }
define service{
        use                     generic-service
        host_name       Windows-Test-Host-A-2008R2
        service_description     WinMemory08R2Pagefile
        check_command           wmi_mem08!page!70%!85%
        }
define service{
        use                     generic-service
        host_name               Windows-Test-Host-B-2003
        service_description     WinDriveC
        check_command           wmi_drv!C:!85%!95%
        }
 
#Windows Test Host A
define host{
        use             windows-server  ; Inherit default values from a template
        host_name       Windows-Test-Host-A-2003        ; The name we're giving to this host
        alias           Test 2003 Windows Server A      ; A longer name associated with the host
        address         10.3.11.32      ; IP address of the host
        }
define service{
        use                     generic-service
        host_name               Windows-Test-Host-A-2003
        service_description     WinDriveC
        check_command           wmi_drv!C:!5%!15%
        }
define service{
        use                     generic-service
        host_name               Windows-Test-Host-A-2003
        service_description     WinMemory
        check_command           wmi_mem!physical!70%!80%
        }
define service{
        use                     generic-service
        host_name               Windows-Test-Host-A-2003
        service_description     WinPagefile
        check_command           wmi_mem!page!5%!85%
        }
#Windows Test Host B
define host{
        use             windows-server  ; Inherit default values from a template
        host_name       Windows-Test-Host-B-2003        ; The name we're giving to this host
        alias           Test 2003 Windows Server B      ; A longer name associated with the host
        address         10.3.11.31      ; IP address of the host
        }
define service{
        use                     generic-service
        host_name               Windows-Test-Host-B-2003
        service_description     WinDriveC
        check_command           wmi_drv!C:!85%!95%
        }
define service{
        use                     generic-service
        host_name               Windows-Test-Host-B-2003
        service_description     WinService-Printspooler
        check_command           wmi_service!Spooler
        }
define service{
        use                     generic-service
        host_name               Windows-Test-Host-B-2003
        service_description     WinService-FileReplication
        check_command           wmi_service!NtFrs
        }
define service{
        use                     generic-service
        host_name               Windows-Test-Host-B-2003
        service_description     WinMemory
        check_command           wmi_mem!physical!80%!90%
        }
define service{
        use                     generic-service
        host_name               Windows-Test-Host-B-2003
        service_description     WinPagefile
        check_command           wmi_mem!page!30%!70%
        }
# Define a hostgroup for Windows machines
define hostgroup{
        hostgroup_name  windows-servers ; The name of the hostgroup
        alias           Windows Servers ; Long name of the group
        members         Windows-Test-Host-B-2003,Windows-Test-Host-A-2003
        }
define service{
        use                     generic-service
        hostgroup               windows-servers
        service_description     WinCPU
        check_command           wmi_cpu!5%!15%
        }
define service{
        use                     generic-service
        hostgroup               windows-servers
        service_description     WinMemory
        check_command           wmi_mem!physical!30%!35%
        }
define service{
        use                     generic-service
        hostgroup               windows-servers
        service_description     WinPagefile
        check_command           wmi_mem!page!5%!15%
        }
define service{
        use                     generic-service
        hostgroup               windows-servers
        service_description     WinEventlogSystem
        check_command           wmi_eventlog!system!1!24
        }
define service{
        use                     generic-service
        hostgroup               windows-servers
        service_description     WinEventlogApplication
        check_command           wmi_eventlog!application!1!24
        }
define service{
        use                     generic-service
        hostgroup               windows-servers
        service_description     WinEventlogSecurity
        check_command           wmi_eventlog!security!1!24
        }

Comment: Drive space percentages can be defined using a Hostgroup, but 3% free space may be acceptable for a 2TB data partition, but probably isn't for a 20GB boot partition. Memory is the same, 98% memory utilisation might be OK for a SQL server, but not a file server.

Verify your configuration:
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

And restart NAGIOS: /etc/init.d/nagios restart

You should see something like this underneath the 'Current Status': Services view -

Initially, it's suggested you tune the monitoring to show a "sea of green" - assuming the infrastructure you wish to monitor is functioning normally.

The final step is to enable email alerting. This tutorial assumes you have an SMTP server elsewhere on your network that NAGIOS can use to relay.
apt-get install nullmailer
Upon configuration, select the host/domain you want the mail to appear to come from, and your SMTP server that allows relaying.

Configure /usr/local/nagios/etc/objects/contacts.cfg
define contact{
        contact_name                    admin1
        use                             generic-contact
        service_notification_options    c,r
        host_notification_options       d,u,r,f,s
        alias                           Admin1
        email                           admin1@example.int
        }
define contact{
        contact_name                    admin2
        use                             generic-contact
        service_notification_options    c,r
        host_notification_options       d,u,r,f,s
        alias                           Admin2
        email                           admin2@example.int
        }

 define contactgroup{
        contactgroup_name       admins
        alias                   Nagios Administrators
        members                 admin1,admin2
        }

And add the following lines underneath each host configured in /usr/local/nagios/etc/objects/windows.cfg
        contacts        admin1,admin2
Add the following lines underneath each service configured, that should receive email notification
        contacts        admin1,admin2

Restart NAGIOS and email notifications should be received when a condition (eg. c=critical) is reached by a host or service.

3 comments:

  1. Hello,

    The link to the check_wmi seems to be broken.
    Do you have any way to put the plugin somewhere else ?

    Thank you,

    ReplyDelete
  2. hello fred, try it out ,

    http://www.edcint.co.nz/checkwmiplus/node/32

    ReplyDelete
  3. Amazing knowledge sharing. Please keep it. Also share something for CPU temperature, Hardware failure, share folder sharing status, cluster , SAN and RAID.

    Thanks

    ReplyDelete