This is an attempt to try and find a good way of monitoring and logging what is going on in the HA module. It’s a work-in-progress, please feel free to contribute.
Smartcenter
The first script and alert below uses a custom alert for a trigger and writes to a log file in the /var/tmp/clusterxl_alert directory on the smartcenter. Using the cron job, a daily email can be sent with the day’s alerts summary. This was posted to CPUG by yheffen – https://www.cpug.org/forums/clustering-security-gateway-ha-clusterxl/9992-ha-failover-log-files.html. Originally written using the korn shell, it works equally well in bash.
#!/bin/bash DIR="/var/tmp/clusterxl_alert" DAILY_LOG="$DIR/alert_daily.log" LOG="$DIR/alert.log" mklog () { if [ ! -f "$1" ]; then touch "$1" chmod 644 "$1" fi } mklog "$LOG" while read ALERT; do echo "$ALERT" >> "$DAILY_LOG" echo "$ALERT" >> "$LOG" done
The path to the script is one of the “UserDefined scripts” defined in the “Policy> Global Properties> Log and Alert> Alert Commands” window. Then in the cluster object’s properties in the “ClusterXL” window, specify this User Defined Alert down in the “Tracking” section.
Cron job code:
0 5 * * * [ -f /var/tmp/clusterxl_alert/alert_daily.log ] && mailx -s "ClusterXL Alerts" me@example.com < /var/tmp/clusterxl_alert/alert_daily.log && rm /var/tmp/clusterxl_alert/alert_daily.log
Security Gateway
This next script, which is very quick and dirty, monitors the interfaces using the “cpaprobstat -a if”. It polls every 2 seconds and writes the result to a file (ha_poll.txt) and compares the result against a reference file (ha_ref.txt) which is created when the script is run initially. If a difference is found, it is logged to the ha_alert.log file. There are better ways to do this but as I said, it’s quick and dirty 🙂
#!/bin/bash # variables DIR="/var/tmp" REFERENCE="$DIR/ha_ref.txt" POLLED="$DIR/ha_polled.txt" LOG="$DIR/ha_alert.log" # functions mkref () { echo `cphaprob -a if` > $REFERENCE } mkpoll () { echo `cphaprob -a if` > $POLLED } # main process # make reference file mkref echo "Entering polling loop, use ctrl-c or" echo "\"kill \$(pgrep ${0##*/})\" from a different terminal to exit" echo # Poll every 2 seconds and compare until ctrl-c. # If status changes log and then make new reference data while true; do mkpoll DIFF=$(diff $REFERENCE $POLLED) if [ "$DIFF" != "" ]; then echo "Change logged to $LOG" echo "" >> $LOG echo $DIFF >> $LOG mkref sleep 2 fi done
Running this as admin in expert mode with an ampersand keeps the process running in the background even if the terminal is disconnected:
[expert@gw]# ./ha_monitor.sh &
One issue here is that if an interface is down, “cphaprob -a if” shows the number of seconds it has been down for:
[Expert@gw]# cphaprob -a if Required interfaces: 4 Required secured interfaces: 2 eth0 UP sync(secured), multicast eth1 Inbound: DOWN (4.7 secs) Outbound: DOWN (5 secs) sync(secured), multicast eth2 UP non sync(non secured), multicast eth3 UP non sync(non secured), multicast
It will therefore see a discrepancy on every poll as the seconds number increases and will create a log entry every 2 seconds until the interface comes back up. Like I said, quick, dirty and a work-in-progress 🙂
EDIT:
New script now:
#!/bin/bash # variables HOSTNAME=`hostname` DIR="/var/tmp" LOG=$DIR"/"$HOSTNAME"_hamon.log" # functions mkref () { echo "Making new reference .." >> $LOG REFERENCE="`cphaprob stat`" echo "Done" >> $LOG echo "" >> $LOG } mkpoll () { POLLED="`cphaprob stat`" } getAndLogVals () { CPHAPROBSTAT=`cphaprob stat` CPHAPROBLIST=`cphaprob list | grep -v "Time since" | grep -v "Registration number" | grep -v "Timeout: none"` CPHAPROBAIF=`cphaprob -a if` echo "" >> $LOG echo "cphaprob stat:" >> $LOG echo "--------------" >> $LOG echo "$CPHAPROBSTAT" >> $LOG echo "" >> $LOG echo "cphaprob list:" >> $LOG echo "--------------" >> $LOG echo "$CPHAPROBLIST" >> $LOG echo "" >> $LOG echo "cphaprob -a if:" >> $LOG echo "---------------" >> $LOG echo "$CPHAPROBAIF" >> $LOG echo "" >> $LOG } # main [] if [ -f $LOG ]; then echo "Removing old log file .." `rm $LOG` fi echo "Starting logging at "`date` >> $LOG echo "" >> $LOG # Record original vals to the log getAndLogVals # get reference vals mkref echo "Monitoring Failover status, use ctrl-c or \"kill \$(pgrep ${0##*/})\" from a different terminal to exit" # Poll continuously and compare until ctrl-c. If status changes, log and get new reference data while true; do mkpoll if [ "$POLLED" != "$REFERENCE" ]; then DIFF="$REFERENCE / $POLLED" echo "" >> $LOG echo "=============================================================================" >> $LOG echo "" >> $LOG echo `date` >> $LOG echo "" >> $LOG echo "HA Status Change detected, logged to $LOG" echo "$DIFF" >> $LOG echo "" >> $LOG getAndLogVals mkref fi done