Checkpoint: Monitoring HA Failover – WIP

This is an attempt to try and find a good way of monitoring and logging what is going on in the HA module. It’s a work-in-progress, please feel free to contribute.

Smartcenter

The first script and alert below uses a custom alert for a trigger and writes to a log file in the /var/tmp/clusterxl_alert directory on the smartcenter. Using the cron job, a daily email can be sent with the day’s alerts summary. This was posted to CPUG by yheffen – https://www.cpug.org/forums/clustering-security-gateway-ha-clusterxl/9992-ha-failover-log-files.html. Originally written using the korn shell,  it works equally well in bash.

#!/bin/bash

DIR="/var/tmp/clusterxl_alert"
DAILY_LOG="$DIR/alert_daily.log"
LOG="$DIR/alert.log"

mklog () {
        if [ ! -f "$1" ]; then
                touch "$1"
                chmod 644 "$1"
        fi
}

mklog "$LOG"

while read ALERT; do
        echo "$ALERT" >> "$DAILY_LOG"
        echo "$ALERT" >> "$LOG"
done

The path to the script is one of the “UserDefined scripts” defined in the “Policy> Global Properties> Log and Alert> Alert Commands” window. Then in the cluster object’s properties in the “ClusterXL” window, specify this User Defined Alert down in the “Tracking” section.

Cron job code:

0 5 * * * [ -f /var/tmp/clusterxl_alert/alert_daily.log ] && mailx -s "ClusterXL Alerts" me@example.com < /var/tmp/clusterxl_alert/alert_daily.log && rm /var/tmp/clusterxl_alert/alert_daily.log

Security Gateway

This next script, which is very quick and dirty, monitors the interfaces using the “cpaprobstat -a if”. It polls every 2 seconds and writes the result to a file (ha_poll.txt) and compares the result against a reference file (ha_ref.txt) which is created when the script is run initially. If a difference is found, it is logged to the ha_alert.log file. There are better ways to do this but as I said, it’s quick and dirty 🙂

#!/bin/bash

# variables
DIR="/var/tmp"
REFERENCE="$DIR/ha_ref.txt"
POLLED="$DIR/ha_polled.txt"
LOG="$DIR/ha_alert.log"

# functions

mkref () {
	echo `cphaprob -a if` > $REFERENCE
}

mkpoll () {
	echo `cphaprob -a if` > $POLLED
}

# main process

# make reference file
mkref

echo "Entering polling loop, use ctrl-c or"
echo "\"kill \$(pgrep ${0##*/})\" from a different terminal to exit"
echo
# Poll every 2 seconds and compare until ctrl-c. 
# If status changes log and then make new reference data
while true; do
	mkpoll
	DIFF=$(diff $REFERENCE $POLLED)
	if [ "$DIFF" != "" ]; then
		echo "Change logged to $LOG"
		echo "" >> $LOG
		echo $DIFF >> $LOG
		mkref
		sleep 2
	fi
done

Running this as admin in expert mode with an ampersand keeps the process running in the background even if the terminal is disconnected:

[expert@gw]# ./ha_monitor.sh &

One issue here is that if an interface is down, “cphaprob -a if” shows the number of seconds it has been down for:

[Expert@gw]# cphaprob -a if

Required interfaces: 4
Required secured interfaces: 2

eth0 UP sync(secured), multicast
eth1 Inbound: DOWN (4.7 secs)  Outbound: DOWN (5 secs) sync(secured), multicast
eth2 UP non sync(non secured), multicast
eth3 UP non sync(non secured), multicast

It will therefore see a discrepancy on every poll as the seconds number increases and will create a log entry every 2 seconds until the interface comes back up. Like I said, quick, dirty and a work-in-progress 🙂

 

EDIT:

New script now:

#!/bin/bash

# variables
HOSTNAME=`hostname`
DIR="/var/tmp"
LOG=$DIR"/"$HOSTNAME"_hamon.log"

# functions

mkref () {
	echo "Making new reference  .." >> $LOG
	REFERENCE="`cphaprob stat`" 
	echo "Done" >> $LOG
	echo "" >> $LOG
}

mkpoll () {
	POLLED="`cphaprob stat`"
}

getAndLogVals () {
	CPHAPROBSTAT=`cphaprob stat`
	CPHAPROBLIST=`cphaprob list | grep -v "Time since" | grep -v "Registration number" | grep -v "Timeout: none"`
	CPHAPROBAIF=`cphaprob -a if`
	echo "" >> $LOG
	echo "cphaprob stat:" >> $LOG
	echo "--------------" >> $LOG
	echo "$CPHAPROBSTAT" >> $LOG
	echo "" >> $LOG
	echo "cphaprob list:" >> $LOG
	echo "--------------" >> $LOG
	echo "$CPHAPROBLIST" >> $LOG
	echo "" >> $LOG
	echo "cphaprob -a if:" >> $LOG
	echo "---------------" >> $LOG
	echo "$CPHAPROBAIF" >> $LOG
	echo "" >> $LOG
}

# main []

if [ -f $LOG ]; then
    echo "Removing old log file .."
	`rm $LOG`
fi

echo "Starting logging at "`date` >> $LOG
echo "" >> $LOG

# Record original vals to the log 
getAndLogVals

# get reference vals
mkref

echo "Monitoring Failover status, use ctrl-c or \"kill \$(pgrep ${0##*/})\" from a different terminal to exit"

# Poll continuously and compare until ctrl-c. If status changes, log and get new reference data
while true; do
	mkpoll
	if [ "$POLLED" != "$REFERENCE" ]; then
		DIFF="$REFERENCE / $POLLED"
		echo "" >> $LOG
		echo "=============================================================================" >> $LOG
		echo "" >> $LOG
		echo `date` >> $LOG
		echo "" >> $LOG
		echo "HA Status Change detected, logged to $LOG"
		echo "$DIFF" >> $LOG
		echo "" >> $LOG
		getAndLogVals
		mkref
	fi
done