Friday, 10 June 2011

Install ESXi on USB or Flash Drive from Windows

Download the VMware ESXi .ISO file from www.vmware.com.

Download and install HP USB Disk Format Tool, IZarc and Winimage.

Use HP USB Format Tool to format your USB device, particularly if it has a previous bootable OS / utility on it. A quick format should do, but if you can afford the time perform a full format.

Use Izarc to extract the file imagedd.bz2 from the ESXi ISO file.

Use Izarc to extract the file imagedd from imagedd.bz2.

Open Winimage and select 'Disk', 'Restore Virtual Hard Disk image on physical drive', select your USB device, OK, change 'Virtual Hard Disks (*.vhd)' to 'All'. Select the imagdd file extracted previously and click Yes.

Reboot and test your bootable ESXi USB device.

Thursday, 9 June 2011

Part 1: Monitoring Windows Servers Agentlessly via WMI using NAGIOS on Ubuntu Server

This how to will show how to install the open source software NAGIOS to monitor network devices such as switches, routers, servers, firewalls and UPS, and alert if they have any problems.

The base installation will use Ubuntu Server 11.04 (latest at time of writing) - due to compatibility with the VMware CLI as detailed later.

For the initial install, use Ubuntu Server 11.04 on a Virtual Machine (VM) with at least 512MB of RAM and an 8GB virtual disk, which can be thin provisioned.

After selecting your language, press F4 to select installing a 'Minimal Virtual Machine' if using VMware. Once setup is complete, install VMware tools.

Install NAGIOS according to this guide, using the default options however skip Step 9.
Note: If you plan to monitor 2008+ servers memory or pagefile, you will require the second (text based) version of this plugin with 'checkmem08' included. Don't forget to change owner (chown) to nagios:nagios and mark the file as executable (+x).

A big thanks and all credit for being able to monitor Windows via WMI from NAGIOS to Matthieu Thibault!

First, install the check_wmi plugin according to Matthieu's blog:

Second, if you are using a Microsoft Active Directory domain -
1. Create a new Group called "No Access"
2. Create a new User called "nagios_svc" and set a secure, non-expiring password
3. Make nagios_svc a member of "No Access" group, and remove from Domain Users...for security reasons
NOTE: To monitor an Active Directory domain controller, the user must be a member of the Domain 'Administrators' group, consider the security consequences first!

On the Windows server you wish to monitor, make this user a member of the "Local Administrators" group.

On the NAGIOS monitoring server -
Edit: /usr/local/nagios/etc/nagios.cfg
Uncomment: cfg_file=/usr/local/nagios/etc/objects/windows.cfg

Edit:/usr/local/nagios/etc/objects/commands.cfg and add the following lines:
#Check Windows Drivesize
define command{
        command_name    wmi_drv
        command_line    /usr/local/nagios/libexec/check_wmi -H $HOSTADDRESS$ -u YOURDOMAIN/nagios_svc -p <Password> -m checkdrivesize -a $ARG1$ -w $ARG2$ -c $ARG3$
        }
#Check Windows CPU
define command{
        command_name    wmi_cpu
        command_line    /usr/local/nagios/libexec/check_wmi -H $HOSTADDRESS$ -u YOURDOMAIN/nagios_svc -p <Password> -m checkcpu -w $ARG1$ -c $ARG2$
        }
#Check Windows Memory <= 2003
define command{
        command_name    wmi_mem
        command_line    /usr/local/nagios/libexec/check_wmi -H $HOSTADDRESS$ -u YOURDOMAIN/nagios_svc -p <Password> -m checkmem -a $ARG1$ -w $ARG2$ -c $ARG3$
        }
#Check Windows Memory => 2008
define command{
        command_name    wmi_mem08
        command_line    /usr/local/nagios/libexec/check_wmi -H $HOSTADDRESS$ -u YOURDOMAIN/nagios_svc -p <Password> -m checkmem08 -a $ARG1$ -w $ARG2$ -c $ARG3$
        }

#Check Windows Eventlog
define command{
        command_name    wmi_eventlog
        command_line    /usr/local/nagios/libexec/check_wmi -H $HOSTADDRESS$ -u YOURDOMAIN/nagios_svc -p <Password> -m checkeventlog -a $ARG1$,$ARG2$,$ARG3$
        }
#Check Windows Services
define command{
        command_name    wmi_service
        command_line    /usr/local/nagios/libexec/check_wmi -H $HOSTADDRESS$ -u YOURDOMAIN/nagios_svc -p <Password> -m checkservice -a $ARG1$
        }

Edit the file /usr/local/nagios/etc/objects/windows.cfg by deleting default contents and replacing with:
#Windows Test Host 2008 R2
define host{
        use             windows-server  ; Inherit default values from a template
        host_name       Windows-Test-Host-A-2008R2        ; The name we're giving to this host
        alias           Test 2008 R2 Windows Server A      ; A longer name associated with the host
        address         10.3.11.8      ; IP address of the host
        }
define service{
        use                     generic-service
        host_name       Windows-Test-Host-A-2008R2
        service_description     WinMemory08R2
        check_command           wmi_mem08!physical!80%!90%
        }
define service{
        use                     generic-service
        host_name       Windows-Test-Host-A-2008R2
        service_description     WinMemory08R2Pagefile
        check_command           wmi_mem08!page!70%!85%
        }
define service{
        use                     generic-service
        host_name               Windows-Test-Host-B-2003
        service_description     WinDriveC
        check_command           wmi_drv!C:!85%!95%
        }
 
#Windows Test Host A
define host{
        use             windows-server  ; Inherit default values from a template
        host_name       Windows-Test-Host-A-2003        ; The name we're giving to this host
        alias           Test 2003 Windows Server A      ; A longer name associated with the host
        address         10.3.11.32      ; IP address of the host
        }
define service{
        use                     generic-service
        host_name               Windows-Test-Host-A-2003
        service_description     WinDriveC
        check_command           wmi_drv!C:!5%!15%
        }
define service{
        use                     generic-service
        host_name               Windows-Test-Host-A-2003
        service_description     WinMemory
        check_command           wmi_mem!physical!70%!80%
        }
define service{
        use                     generic-service
        host_name               Windows-Test-Host-A-2003
        service_description     WinPagefile
        check_command           wmi_mem!page!5%!85%
        }
#Windows Test Host B
define host{
        use             windows-server  ; Inherit default values from a template
        host_name       Windows-Test-Host-B-2003        ; The name we're giving to this host
        alias           Test 2003 Windows Server B      ; A longer name associated with the host
        address         10.3.11.31      ; IP address of the host
        }
define service{
        use                     generic-service
        host_name               Windows-Test-Host-B-2003
        service_description     WinDriveC
        check_command           wmi_drv!C:!85%!95%
        }
define service{
        use                     generic-service
        host_name               Windows-Test-Host-B-2003
        service_description     WinService-Printspooler
        check_command           wmi_service!Spooler
        }
define service{
        use                     generic-service
        host_name               Windows-Test-Host-B-2003
        service_description     WinService-FileReplication
        check_command           wmi_service!NtFrs
        }
define service{
        use                     generic-service
        host_name               Windows-Test-Host-B-2003
        service_description     WinMemory
        check_command           wmi_mem!physical!80%!90%
        }
define service{
        use                     generic-service
        host_name               Windows-Test-Host-B-2003
        service_description     WinPagefile
        check_command           wmi_mem!page!30%!70%
        }
# Define a hostgroup for Windows machines
define hostgroup{
        hostgroup_name  windows-servers ; The name of the hostgroup
        alias           Windows Servers ; Long name of the group
        members         Windows-Test-Host-B-2003,Windows-Test-Host-A-2003
        }
define service{
        use                     generic-service
        hostgroup               windows-servers
        service_description     WinCPU
        check_command           wmi_cpu!5%!15%
        }
define service{
        use                     generic-service
        hostgroup               windows-servers
        service_description     WinMemory
        check_command           wmi_mem!physical!30%!35%
        }
define service{
        use                     generic-service
        hostgroup               windows-servers
        service_description     WinPagefile
        check_command           wmi_mem!page!5%!15%
        }
define service{
        use                     generic-service
        hostgroup               windows-servers
        service_description     WinEventlogSystem
        check_command           wmi_eventlog!system!1!24
        }
define service{
        use                     generic-service
        hostgroup               windows-servers
        service_description     WinEventlogApplication
        check_command           wmi_eventlog!application!1!24
        }
define service{
        use                     generic-service
        hostgroup               windows-servers
        service_description     WinEventlogSecurity
        check_command           wmi_eventlog!security!1!24
        }

Comment: Drive space percentages can be defined using a Hostgroup, but 3% free space may be acceptable for a 2TB data partition, but probably isn't for a 20GB boot partition. Memory is the same, 98% memory utilisation might be OK for a SQL server, but not a file server.

Verify your configuration:
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

And restart NAGIOS: /etc/init.d/nagios restart

You should see something like this underneath the 'Current Status': Services view -

Initially, it's suggested you tune the monitoring to show a "sea of green" - assuming the infrastructure you wish to monitor is functioning normally.

The final step is to enable email alerting. This tutorial assumes you have an SMTP server elsewhere on your network that NAGIOS can use to relay.
apt-get install nullmailer
Upon configuration, select the host/domain you want the mail to appear to come from, and your SMTP server that allows relaying.

Configure /usr/local/nagios/etc/objects/contacts.cfg
define contact{
        contact_name                    admin1
        use                             generic-contact
        service_notification_options    c,r
        host_notification_options       d,u,r,f,s
        alias                           Admin1
        email                           admin1@example.int
        }
define contact{
        contact_name                    admin2
        use                             generic-contact
        service_notification_options    c,r
        host_notification_options       d,u,r,f,s
        alias                           Admin2
        email                           admin2@example.int
        }

 define contactgroup{
        contactgroup_name       admins
        alias                   Nagios Administrators
        members                 admin1,admin2
        }

And add the following lines underneath each host configured in /usr/local/nagios/etc/objects/windows.cfg
        contacts        admin1,admin2
Add the following lines underneath each service configured, that should receive email notification
        contacts        admin1,admin2

Restart NAGIOS and email notifications should be received when a condition (eg. c=critical) is reached by a host or service.

Monday, 6 June 2011

Free Redundant Layer 3 Routing with VRRP 802.1Q VLANs using Debian Linux



How to use Linux to get redundant routing configured, with enterprise grade features.

Distro: Debian 6 Server

Install a base Debian 6 server, specific to your environment. The first NIC should be configured with your preferred default route, eg. firewall. We select the default packages "Standard" and "SSH".


If using VMware, create a new vSwitch in VLAN "All" (802.1Q Tag 4095)

Add a second network adapter to the Virtual Machine, connected to the new vSwitch.
apt-get install vlan
Add the VLANs as per the diagram above.
vconfig add eth1 70
vconfig add eth1 80
vconfig add eth1 90

Configure the network interfaces file /etc/network/interfaces using your favourite text editor (mine's nano) with these additions:
auto eth0
auto eth1.70
iface eth1.70 inet static
        address 192.168.70.253
        netmask 255.255.255.0
        network 192.168.70.0
        broadcast 192.168.70.255
        vlan_raw_device eth1
auto eth1.80
iface eth1.80 inet static
        address 192.168.80.253
        netmask 255.255.255.0
        network 192.168.80.0
        broadcast 192.168.80.255
        vlan_raw_device eth1
auto eth1.90
iface eth1.90 inet static
        address 192.168.90.253
        netmask 255.255.255.0
        network 192.168.90.0
        broadcast 192.168.90.255
        vlan_raw_device eth1
Restart networking: /etc/init.d/networking restart.

Enable IP forwarding (routing):
echo 1 > /proc/sys/net/ipv4/ip_forward
Make the change permanent, uncomment this line in /etc/sysctl.conf
#net.ipv4.ip_forward = 1
 
Install a DHCP server
apt-get install isc-dhcp-server

Remove the default file /etc/dhcp/dhcpd.conf and replace with:
log-facility local7;
subnet 192.168.70.0 netmask 255.255.255.0 {
  range 192.168.70.20 192.168.70.25;
  option domain-name-servers 192.168.70.253,192.168.70.252;
  option domain-name "example.int";
  option routers 192.168.70.254;
  option broadcast-address 192.168.70.255;
  default-lease-time 2400;
  max-lease-time 7200;
}
subnet 192.168.80.0 netmask 255.255.255.0 {
  range 192.168.80.20 192.168.80.25;
  option domain-name-servers 192.168.80.253,192.168.80.252;
  option domain-name "example.int";
  option routers 192.168.80.254;
  option broadcast-address 192.168.80.255;
  default-lease-time 2400;
  max-lease-time 7200;
}
subnet 192.168.90.0 netmask 255.255.255.0 {
  range 192.168.90.50 192.168.90.150;
  option domain-name-servers 192.168.90.253,192.168.90.252;
  option domain-name "example.int";
  option routers 192.168.90.254;
  option broadcast-address 192.168.90.255;
  default-lease-time 2400;
  max-lease-time 7200;
}
 
Start the DHCP server:
/etc/init.d/isc-dhcp-server start
 
Install a DNS Caching Only Server
 apt-get install bind9
Edit /etc/bind/named.conf.options and add these lines after the comments //
        listen-on { any; };
        forwarders {<your DNS forwarder1>;<your DNS forwarder2>;};
Start BIND:
/etc/init.d/bind9 start
 
Now shutdown your VM (or Physical) and clone it to another physical server.
 
Boot the 'clone, and change the hostname/IP in the following files:
/etc/hostname [change from RouterPri to RouterBak]
/etc/hosts [change from RouterPri to RouterBak]
/etc/network/interfaces [Change IPs from .253 to .252]
/etc/dhcp/dhcpd.conf [Change the IP pools to be different to the primary]
 
Reboot, Boot the 'original'
 
Install 'keepalived' and 'vrrpd'
apt-get install keepalived vrrpd -y
 
Configure on RouterPri file /etc/keepalived/keepadlived.conf
vrrp_instance VR1 {
        state MASTER
        interface eth1
        virtual_router_id 1
        priority 100
        authentication {
        auth_type PASS
        auth_pass password
        }
        virtual_ipaddress {
        192.168.70.254/24 brd 192.168.70.255 dev eth1.70
        192.168.80.254/24 brd 192.168.80.255 dev eth1.80
        192.168.90.254/24 brd 192.168.90.255 dev eth1.90
        }
 
Configure the same file on RouterBak
vrrp_instance VR1 {
        state SLAVE
        interface eth1
        virtual_router_id 1
        priority 50
        authentication {
        auth_type PASS
        auth_pass password
        }
        virtual_ipaddress {
        192.168.70.254/24 brd 192.168.70.255 dev eth1.70
        192.168.80.254/24 brd 192.168.80.255 dev eth1.80
        192.168.90.254/24 brd 192.168.90.255 dev eth1.90
        } 
At this point, you can assign a NIC to one of your other VMs (using VMXNET3)
and use VLAN tags to test DHCP, DNS and Primary to Backup L3 failover. 
 
A dynamic routing protocol is required to notify other Layer 3 devices on the
network of the route change, if failover is to occur. For this, we will use the 'quagga'
daemon and configure it with the RIPv2 protocol.
 
apt-get install quagga
 
Edit /etc/quagga/daemons
zebra=yes
ripd=yes
 
Edit /etc/quagga/zebra.conf
hostname Router
password zebra
enable password zebra
!
interface eth0
 ipv6 nd suppress-ra
!
interface eth1
 ipv6 nd suppress-ra
!
interface eth1.70
 ipv6 nd suppress-ra
!
interface eth1.80
 ipv6 nd suppress-ra
!
interface eth1.90
 ipv6 nd suppress-ra
!
interface lo
!
ip forwarding
!
!
line vty
! 
 
Edit /etc/quagga/ripd.conf on both routers.
hostname Router*.example.int
password zebra
log file /var/log/quagga/ripd.log
log stdout
!
router rip
 version 2
 timers basic 30 120 120
 redistribute kernel
 redistribute connected
 redistribute static
 network 192.168.70.0/24
 network 192.168.80.0/24
 network 192.168.90.0/24
 network eth0
 network eth1
 network eth1.70
 network eth1.80
 network eth1.90
 neighbor <Your L3 Device 1>
 neighbor <Your L3 Device 2>
!
line vty
!
Start the quagga daemon: /etc/init.d/quagga start
You can now test the failover/failback of your HA solution by disconnecting and 
reconnecting NICs on the Primary router, and monitoring /var/log/syslog
You should see only a few packets dropped (depending on the convergence of
other L3 devices) during the failover and failback between the routers.
Your syslog during failover/failback should look something like this:
 
Thoughts on running this in production...
  • Consider installing each router on the local disks of your VMware hosts
    • Set them to auto-boot with the hosts 
      Ensure you can communicate on the same subnet/VLAN in case of problems 
 
Please let me know your thoughts and feedback if you've found this useful!