Monday, February 25, 2013

Linuxha.net and DRBD 8.4

So ... I was all set to perform a standard set of tests on the latest version of Scientific Linux (RHEL Clone) and push out linuxha 1.4.11 - and it failed!

The problem was that the command line handling between DRBD 8.3 and below simply was radically different for 8.4 which was what SL linux packages used.

So.. back to the drawing board! The results are finished now and Linuxha.net now works with DRBD 8.4 and should thus work with DRBD 9 stable releases too when they become available.

I've not yet packaged the code since I need to regression test the changes with older DRBD releases - but fingers crossed the packages should be available within the next 7 days ... and the documentation updated shortly afterwards.

Wednesday, February 6, 2013

Linuxha.net 1.4.11 nearly ready

Whilst I'm still working in the background on the next truecl clustering software release I've taken a bit of a time-out on it to refresh the Linux 2 node DRBD clustering software I called "linuxha.net".

For it's 10th birthday I'm updating it to take account of the fact that recent kernels and Linux distributions include DRBD software and so I don't need to. The end result is a further simplification of the installation requirements - always a bonus.

Of course that doesn't make an exciting release does it? Hence I've also taken the opportunity to do the following:

  1. Strip out many bundling Perl modules (not all) - relying on the distribution or the administrator to provide these. The result - more up to date versions will probably be used and should improve security/performance and reliability.
  2. Reorganised the installation structure. Previously the software placed files in about 6 different directory structures; not ideal. Now the software lands in /opt/linuxha14 - though /etc/cluster is still used for configuration and /var/log/cluster for log files.
  3. Simplified the event monitoring system; typically no longer need to edit a configuration file; the default "just works".
  4. Improved Network handling; the infrastructure better handles network failures and IP fail-over - for example it is possible to run application adds and rebuilds even if the standard node IP addresses are not in operation.
  5. IPv6 support. It is possible to use the software totally using IPv6 functionality - both for the software itself and the application IP addresses it presents. Of course it will work with IPv4 too if you wish; or both at the same time if you really want to!

 As an example - here is a ping6 of an application IP address during a fail-over - lost about 40 seconds before it was visible on the remaining node in the cluster.

64 bytes from fec0::192:168:1:200: icmp_seq=220 ttl=64 time=0.164 ms
64 bytes from fec0::192:168:1:200: icmp_seq=221 ttl=64 time=0.201 ms
64 bytes from fec0::192:168:1:200: icmp_seq=222 ttl=64 time=0.133 ms

64 bytes from fec0::192:168:1:200: icmp_seq=266 ttl=64 time=0.281 ms
64 bytes from fec0::192:168:1:200: icmp_seq=267 ttl=64 time=0.123 ms
64 bytes from fec0::192:168:1:200: icmp_seq=268 ttl=64 time=0.112 ms

A snapshot of the cluster application whilst a single node was running:

# clstat -A apache
Cluster: test - UP

 Application       Node      State  Runnnig  Monitor  Stale  Fail-over?
      apache   lubuntu2    STARTED  0:00:09  Running      2          No

 File Systems

 Mount Point              Valid   Type      State   % Complete  Completion
 /apache/admin            local   drbd     Unsync
 /apache/data             local   drbd     Unsync

 General Monitors

            Type          Name    Status
      FS Monitor     fsmonitor   Running


An subset of the DRBD configuration - showing it running over IPv6 in this cluster:



protocol C;
_this_host {
    device            minor 0;
    disk            "/dev/apachevg/apache2";
    meta-disk        "/dev/apachevg/apache2_meta" [ 0 ];
    address            ipv6 [fec1::192:168:100:25]:9906;
}
_remote_host {
    address            ipv6 [fec1::192:168:100:26]:9906;
}


So when will it be released? Soon is the best answer I can give. I need to validate the functionality across recent Redhat Linux configurations and update the documentation - including a 1.4.10 to 1.4.11 upgrade guide.