5. Best practices

This chapter contains tips, tricks and examples of how to make good use of synctool.

5.1 Use logical group names

synctool allows you to use nodenames as extension on files in the repository. This is nice because it allows you to easily differentiate for a single node. However, it is much better practice to put that node in a certain group, and let it be the only member of that group. Now label the file as belonging to that group, and you’re done.

Bad:

overlay/all/etc/hosts.allow._all
overlay/all/etc/hosts.allow._node1
overlay/all/etc/motd._all
overlay/all/etc/motd._node1

Good:

overlay/mycluster/etc/hosts.allow._all
overlay/mycluster/etc/hosts.allow._login
overlay/mycluster/etc/motd._all
overlay/mycluster/etc/motd._login

The advantage is that you will be able to shift the service to another node simply by changing the synctool configuration, rather than having to rename all files in the repository.

5.2 Future planning: Be mindful of group ‘all’

synctool has this great group ‘all’ that applies to all nodes. One day, however, you decide to add a new machine to your cluster that has a pretty different role than any other. (Not) suprisingly, synctool will want to apply all files tagged as all to the new node — but in this case, it’s exactly not what you want.

The problem is that all is too generic, and the solution is to rename overlay/all/ to something else, such as overlay/common/, or better yet, overlay/subcluster1/. This moves all out of the way and you can integrate your new node in a better way.

The lesson here is that overlay/all/ is a nice catch-all directory, but it’s maybe best left unused. It’s perfectly OK for files to be tagged as ._all, but they really should be placed in a group-specific overlay directory.

5.3 Use group extensions on directories sparingly

synctool allows you to add a group extension to a directory, like so:

$overlay/all/etc._mygroup/

This is a powerful feature. However, it can make a mess of your repository as well. If you catch yourself having to use find all the time to pinpoint the location of a file in the repository, chances are that you making things harder on yourself than ought to be.

Maybe it is better structured as:

$overlay/mygroup/etc/

but maybe it is better structured as:

$overlay/all/etc/somefile._all
$overlay/all/etc/somefile._mygroup

The main message is “keep it simple”. Try not to use too many group directories, because it makes things complicated.

5.4 Do not manage the master node

It is recommended that you do not manage the master node with synctool. The reason is that it makes things more complicated when you choose to put the configuration of your master node under control of synctool.

Why is this? It is mainly because synctool by default works on ‘all’ nodes, and for some reason it is unnatural when ‘all’ includes the master node too. Imagine calling dsh reboot to reboot all nodes, and pulling down the master with you (in the middle of the process, so you may not even succeed at rebooting all nodes).

It often means doing double work; the config files of the master node tend to differ from the ones on your nodes. It is good practice though to label the master node as master, and to simply ignore it:

node n01 master
ignore_group master

It’s also OK to leave the master node out of the configuration altogether. (Here, ‘master’ is a group, not to be confused with the master keyword that defines the master node. Are you still with me?)

If you still want to manage the master with synctool, do as you must. Just be sure to call dsh -X master reboot when you want to reboot your cluster.

5.5 Managing multiple clusters with one synctool

It is really easy to manage just one cluster with one synctool. This means that your repository will contain files that apply only to that cluster. In reality, you may have only one management host for managing multiple clusters, or you may have several subclusters in a single cluster.

Managing such a setup with synctool used to be hard & hackish, but since version 6 it is only a matter of using groups in the right way.

Add all clusters to the synctool repository as you would with adding more nodes. Create a group for each (sub)cluster. For each clustergroup, add a directory overlay/cluster/, so you can handle them independently whenever you wish to. Your repository will look like this:

$overlay/all/
$overlay/cluster1/
$overlay/cluster2/
$overlay/cluster3/

If you tend to reinstall systems a lot with different operating systems, it may be a good idea to create per OS groups. This also helps when upgrading to new OS releases.

$overlay/wheezy/
$overlay/centos64/
$overlay/sles11sp2/

Note that files under these directories can still be marked ._all, synctool will select the correct file as long as you tag the node with the right group.

Decide for yourself what files should go under what directory, and what layout works best for you.

5.6 Use a tiered setup for large clusters

If you have a large cluster consisting of hundreds or thousands (or maybe more) nodes, you will run into a scaling issue at the master node. synctool doesn’t care how many nodes you manage with it, but doing a synctool run to a large number of nodes will take considerable time. You can increase the num_proc parameter to have synctool do more work in parallel, but this will put more load on your master node. To manage such a large cluster with synctool, a different approach is needed.

The solution is to make a tiered setup. In such a setup, the master node syncs to other master nodes (for example, the first node in a rack), which in turn sync to subsets of nodes. Script it along these lines:

#! /bin/bash

for rack in $RACKS
do
    # give rackmasters a full copy of the repos
    rsync -a --delete /opt/synctool/ ${rack}-n1:/opt/synctool/

    # run synctool on rackmaster
    dsh -n ${rack}-n1 --no-nodename \
      /opt/synctool/bin/synctool -g $rack "$@"
done &
wait

So, the master node syncs to ‘rack masters’, and the rack masters in turn run synctool on their subset of nodes. In the config, the nodes are grouped by rack. The option --no-nodename is used with dsh to make the output come out right. You also still need to manage the rack masters — with synctool, from the master node.

A slightly different solution is to make use of slave nodes; the master syncs full copies only to slaves; next, the slaves manage the nodes. This requires having multiple config files (eg, one per rack) and scripting it so that it uses the correct config file for each rack.

synctool -c slaves.conf "$@"
for rack in $RACKS
do
    dsh -n ${rack}-n1  --no-nodename \
      synctool -c confs/${rack}.conf "$@"
done

This tip is mentioned here mostly for completeness; I recommend running with a setup like this only if you are truly experiencing problems due to the scale of your cluster. There are security implications to consider when giving nodes a full copy of the repository. It depends on your situation whether it is acceptable to run like this.

5.7 Manage hosts behind a gateway

As synctool relies on SSH authentication you can easily manage hosts that are not directly available. Imagine this setup:

synctool-master-node
    |
Internet
    |
gateway.your.domain
    |
    + privatenode1
    + privatenode2

You need to set up your ssh connection as follows in /root/.ssh/config:

Host *.intra.your.domain
    ProxyCommand ssh gateway.your.domain -W %h:22

Add your hosts in synctool.conf:

node privatenode1 group1 group2 ipaddress:privatenode1.intra.your.domain
node privatenode2 group1 group2 ipaddress:privatenode2.intra.your.domain

Of course this requires also a proper DNS setup for your intra zone at gateway.your.domain

synctool documentation