Table of contents
Installation
2.1 Installation dependencies
2.2 Passwordless SSH
2.3 Installing the software
2.4 synctool configuration: nodes and groups
2.5 Testing with dsh
2.6 Your first synctool run
2.7 Client installationUsing synctool
3.1 Populating the repository
3.2 Adding actions to updates
3.3 Other useful options
3.4 Templates
3.5 Purge directories
3.6 The order of operations
3.7 dsh-pkg, the synctool package manager
3.8 Ignoring them: I’m not touching you
3.9 Backup copies
3.10 Logging
3.11 About symbolic links
3.12 Slow updates
3.13 Checking for updates
3.14 Running tasks with synctool
3.15 Multiplexed connectionsBest practices
5.1 Use logical group names
5.2 Future planning: Be mindful of group ‘all’
5.3 Use group extensions on directories sparingly
5.4 Do not manage the master node
5.5 Managing multiple clusters with one synctool
5.6 Use a tiered setup for large clusters
5.7 Manage hosts behind a gateway
1. What is synctool
synctool is a tool that can help you administer your cluster of computers.
Its primary function is keeping configuration files ‘in sync’,
i.e.: as they ought to be. Its core business is copying configuration files
to groups (or classes) of computers within a cluster and comparing such
files with a normative copy that you keep in a repository.
The repository, by the way, is not some database system, but an ‘overlay’
directory tree in a file system, that looks very much like the directories
of the managed target systems. The only things missing from the repository
are the files and directories that you do not want or need synctool to
manage. In the repository, you can manage directories with conventional
UNIX tools — cp
, mv
, mkdir
— or any tool you like, and you can edit
files with the editor of preference.
There are other tools in existence that do the same thing as synctool, and ironically, none of them are as easy to understand and use as synctool. Perhaps this is so, because other tools try to do the same thing, among many other things as well. synctool does not try to be an all encompassing system administation tool, and does not have its own little scripting language to define your system in. It does not strive to automate all aspects of the system administrator’s work. Rather, it focuses on its core business only and concentrates on doing that very well. This is very much in line with the traditional UNIX design philosophy — and with common sense. The powerful set of now common shell tools grew by adding commands that were designed to do only specific tasks very well and to be used easily in combination with other tools that specialize in other tasks.
Because of that design philosophy:
synctool integrates very easily into existing system adminstration practices as an add-on tool, specifically to do configuration file management. It does not interfere with other things and does not need much either: It is written in the Python language, and it uses the power of
rsync
andssh
to distribute files.It is possible to use synctool in the style that suits you best: Warn you whenever things are out of ‘sync’ or do automated repairs of deviations. It is even possible to manage some files with synctool and leave other files to other mechanisms — what is not represented in the synctool repository is not managed by it.
Although synctool has many command-line options, its set of core functions is very small and easy to understand. There is not that much you need to know to use it, so there is virtually no learning curve to get you started with synctool.
In addition, synctool simplifies things by working with the following concepts:
- Some clusters are more homogeneous than others. To handle differentiation within a cluster, a host can be part of one or more logical groups;
- Files are designated to a group by means of filename extension;
- The ‘overlay’ directory tree contains the files that are ‘synced’ to the target hosts;
- When certain files are updated, you will want to execute a script
(e.g, to run
'service daemon restart'
). synctool has a mechanism for this. You can make synctool more powerful by writing plugin scripts that run the commands you want whenever a particular file has been updated.
synctool manages configuration files, not processes, and not full system installations. However, synctool comes with handy tools to run commands across the cluster and do synchronized updates of software packages.
synctool does not hide UNIX from you. Making clever use of synctool makes it a very powerful tool.
synctool started in 2003 and has since been in use with great success, doing real work at big computing sites. Hopefully, it will be of some value to you as well.
2. Installation
In synctool terminology, a node is a host, a computer in a group of computers. A group of computers is called a cluster.
2.1 Installation dependencies
synctool depends on a number of (fairly standard) programs:
- python3 version 3.6 or better
- ssh, preferably OpenSSH version 5.6 or better
- rsync
ping
, or you can configure fping later- markdown and smartypants — but only if you want to install this documentation as HTML pages
If you got all that, it’s on to the next section.
2.2 Passwordless SSH
synctool requires passwordless SSH from the master node to each cluster node as root. If you need more information on how to set this up, please see the SSH documentation or just google around. I like to give you these tips:
- use an SSH keypair
- or use hostbased authentication, also for root
- set
PermitRootLogin without-password
insshd_config
on the nodes - use
ssh-keyscan
to create/etc/ssh/ssh_known_hosts
- run
sshd
only the internal network interface to secure your system; configureListenAddress
appropriately - in general, passwordless SSH from any cluster node to your master node should not work or be allowed — or at least, synctool does not need this
If you want extra security, use a passphrase on the keypair and employ
ssh-agent
. Use ssh-add
with a timeout.
For sites with extra tight security, it is possible to configure ssh
to
run only specific (synctool) commands, or maybe you want to change
the ssh_cmd
in synctool’s configuration so that it runs a different command,
one that does suit your security needs.
When passwordless SSH as root works, proceed to installing the software.
2.3 Installing the software
To install synctool on the master node, run setup.sh
like so:
# ./setup.sh --installdir=/opt/synctool
The default location is /opt/synctool
, which is a good place to put it.
Note that synctool requires an ‘installdir’ directory of its own. The
installdir is not the same as a prefix; whatever you do, do not install
synctool directly under /usr
or /usr/local
. Use /usr/synctool
or
/usr/local/synctool
instead, or better, stick with the default location.
The rest of the documentation assumes the default /opt/synctool
.
setup.sh
creates the following directory structure:
/opt/synctool/bin/ synctool commands
/opt/synctool/sbin/ 'system' programs
/opt/synctool/etc/ configuration files
/opt/synctool/lib/ libraries, modules
/opt/synctool/lib/synctool/
/opt/synctool/lib/synctool/main/
/opt/synctool/lib/synctool/pkg/
/opt/synctool/doc/ documentation
/opt/synctool/scripts/ place to store your scripts
/opt/synctool/var/ repository directory
/opt/synctool/var/overlay/
/opt/synctool/var/delete/
/opt/synctool/var/purge/
The doc/
directory contains a copy of this documentation.
You may build the HTML documentation from the plain text sources
by running setup.sh
with --build-docs
.
The following synctool commands will be made available in
/opt/synctool/bin/
:
synctool Main command
dsh Run remote commands
dsh-pkg Upgrade or install packages
dsh-ping Check whether nodes are up
dsh-cp Copy files to nodes
synctool-client Only run on target nodes
synctool-client-pkg Only run on target nodes
synctool-config Inspect the configuration
synctool-template Useful command for .post scripts
Tip: Add
/opt/synctool/bin
to yourPATH
.
2.4 synctool configuration: nodes and groups
Copy the synctool.conf.example
file to /opt/synctool/etc/synctool.conf
.
Edit synctool.conf
, adjusting it as needed.
The file synctool.conf
describes what your cluster looks like;
what nodes have what roles, and how synctool can contact them.
Think a bit about what role each machine has. There is no need to go into
great depth right now; you can always adjust the configuration later.
node n1 group1 group2 ipaddress:machine-n01
The nodename is the ‘synctool name that the node has.’ It is in general the
short hostname of the host, but in fact it can be anything you like.
The nodename has nothing to do with hostnames or DNS entries.
The ipaddress
specifier tells synctool how to contact the node; this can be
an IP address or a DNS name of the host you wish to contact. In clusters,
there is often a management network interface — configure its IP address
here. The ipaddress
specifier is optional and only needed if the nodename
does not exactly match the DNS name for contacting the remote host.
Directly following the node name, you can list groups. synctool uses the term ‘group’, but you can also think of them as node properties. You can make up as many different properties as you like. You can split long lines by ending them with a backslash:
node n101 workernode plasma mathworks solar \
fsmounted backup debian ipaddress:if0-n101
Mind that in practice, synctool repositories are generally easiest maintainable with as few groups as possible. Make sure to use logical names for logical groups, and use a top-down group structure. Make things easy on yourself.
If you have many nodes that all share the same long list of groups, the groups may be abbreviated by defining a compound group. This compound group must be defined before defining the nodes:
group wn workernode plasma mathworks solar \
fsmounted backup
node n101 wn debian ipaddress:if0-n101
You have to add a node definition for each and every node in your cluster. If your nodes are neatly numbered (and for large clusters, they often are), you can make use of node ranges and IP address sequences, like so:
node n[001-100] wn debian ipaddress:if0-n[001]
node n[101-200] wn debian ipaddress:192.168.1.[20]
If you do have the luxury of a high performance shared filesystem on your
cluster, you may put /opt/synctool/
on there and add rsync:no
to the node
definition lines in the config file to tell synctool not to run rsync
.
Mind that there are certain security implications with having a shared
filesystem between management and production nodes.
Next, you have to tell synctool which node is the master management node.
This is done by setting master
to the fqdn (fully qualified domain name)
of the management host.
master n1.mycluster.org
If you don’t know what the fqdn is, you can get it by running the command:
synctool-config --fqdn
If you want to manage the master node itself with synctool, you should also define it as a node. It is a matter of taste, but it is maybe better not to do so. If you choose not to manage the master node, it may be omitted from the configuration. You may also explicitly exclude it:
node n1 master hostname:n1.mycluster.org
ignore_node n1
Beside a master node, you may also define slave nodes. Slaves are cold standby’s that get full copies of the synctool repository. A slave may be used as a failback in case your management host breaks down. Since there can be only one master node in a synctool cluster, slaves must be enabled ‘by hand’ by editing the config file and changing the master definition.
Previous versions of synctool had a
masterdir
setting. It no longer exists; the overlay directory now must reside under the synctool root, under/opt/synctool/var/
.
You can test your synctool.conf
with the command synctool-config
.
It’s more exciting however to test with dsh
and actually run commands
on the cluster.
2.5 Testing with dsh
After filling in a couple of nodes in synctool.conf
, try the command
dsh-ping
to see if the nodes are ‘up’. If they are, try running the
commands dsh hostname
, dsh uptime
, or dsh date
.
If you correctly set up passwordless SSH, dsh
should run the commands on
every node without problems or needed manual intervention. It is important
that this works before proceeding.
Some (mostly IBM) systems already have a
dsh
command. Be mindful to start the correctdsh
command.
See section 3.15 for a trick that greatly speeds up synctool and dsh using OpenSSH’s multiplexed connections capability.
2.6 Your first synctool run
Now that you have a rough setup on the master node, try running synctool
to a single node:
synctool -n nodename
There should be some output message saying DRY RUN. This means that synctool is now working. You can try running synctool across all nodes:
synctool
Check that every node responds. If it doesn’t, go back to the step where
we tested the setup using dsh
.
When synctool to every node works, the basic setup is done and you can start
filling your repository with useful files.
2.7 Client installation
As you may have noticed, we never installed any client software on the nodes.
There is no client installation step; the master node automatically
updates synctool on the client nodes. The binaries needed for this are
located under /opt/synctool/sbin/
, and this directory gets synced to the
nodes with rsync
every time you run synctool
.
3. Using synctool
The main power of synctool is the fact that you can define logical groups,
and you can add these to a filename as a filename extension. This will result
in the file being copied, only if the node belongs to the same group.
The groups a node is in, are defined in the synctool.conf
file.
In the configuration file, the nodename is associated with one or more groups.
The nodename itself can also be used as a group to indicate that a file
belongs to that node.
Under the synctool root there are these interesting directories:
/opt/synctool/var/overlay/
/opt/synctool/var/delete/
/opt/synctool/var/purge/
This is referred to as ‘the repository’.
The overlay/
tree contains files that have to be copied to the target nodes.
When synctool detects a difference between a file on the system and a file
in the overlay tree, the file will be copied from the overlay tree onto
the node.
The delete/
tree contains files that always have to be deleted from the
nodes. Only the filename matters, so it is alright if the files in this tree
are only zero bytes in size.
The purge/
tree contains directories that are copied as-is to the nodes,
and deleting any files on the target node that are unmanaged — files that
should not be there.
synctool uses rsync
to copy these trees to the node, and afterwards it
runs the synctool-client
command on that node. Note that it is perfectly
possible to run synctool-client
on a node by hand, in which case it will
check its local copy of the repository. The client by itself will not
synchronize with the master repository; synctool works with server push
and not client pull.
In old times, synctool was located under
/var/lib/synctool/
. It worked for me (tm), except that the Filesystem Hierarchy Standard (FHS) has various things to say about it:
- thou shalt put configuration files under an
etc/
directory;- thou shalt not execute programs from the
/var
partition;/var
may be mounted read-only;- programs that want to keep things together, should use
/opt
.If you have difficulty with getting used to synctool’s new root, try this:
- symlink
/var/lib/synctool -> /opt/synctool/var
export overlay=/opt/synctool/var/overlay ; cd $overlay
3.1 Populating the repository
In the repository you will store all the important system configuration files of the cluster nodes. The overlay directory represents the root directory of the cluster nodes. By assigning an extension to a file in the repository, you can tell synctool what nodes should get what copy of a file. Consider this example:
/opt/synctool/var/overlay/all/
etc/ntp.conf._all
etc/ntp.conf._node1
etc/ntp.conf._wn
Here, worker nodes (nodes tagged with group wn
in synctool.conf
) will
get the file ntp.conf._wn
for /etc/ntp.conf
. Node node1
is special
and gets a different file. All other nodes will get ntp.conf._all
.
There is a special group named 'none'
. Files with the extension ._none
will be copied to no nodes at all. This can be convenient when you
temporarily wish to ‘disable’ a file.
synctool responds to the directory directly under overlay/
; it selects
this subtree as a candidate when the node has a matching group. For example,
/opt/synctool/var/overlay/wn/
etc/ntp.conf._all
this file will only be used on worker nodes because it resides in the
overlay directory specific to the group wn
.
Tip: Do not make group-specific overlay directories for each and every group. Instead, think about what subclusters you have, and arrange your repository accordingly. See also chapter 5 on Best Practices.
In synctool version 5, you would configure ‘overlaydir’ and synctool would still consider all overlay directories no matter what name the subdirectory had. In synctool 6 and up, the group is strictly enforced and the subtree is synced to only those nodes that are in the group. Slave nodes are special; they get a full copy of the repository.
To populate the repository, you can scp
files from nodes, or you can use
synctool’s super convenient upload feature:
synctool -n node1 --upload /etc/ntp.conf
synctool -n node1 -u /etc/ntp.conf
synctool will automatically choose an extension for the file to save. If you disagree and want a different suffix, choose one:
synctool -n node1 --upload /etc/ntp.conf --suffix wn
synctool -n node1 -u /etc/ntp.conf -s wn
synctool will suggest the overlay directory where to put the file in the repository. If you disagree, use:
synctool -n node1 --upload /etc/ntp.conf --overlay mycluster
synctool -n node1 -u /etc/ntp.conf -o mycluster
By default synctool does a dry run. It will not do anything but show
what would happen if this would not be a dry run. Add -f
or --fix
to
really upload the file.
Now edit the the uploaded ntp.conf
, make some changes and run synctool:
root@masternode:/# synctool
node1: DRY RUN, not doing any updates
node1: /etc/ntp.conf updated (file size mismatch)
Again, synctool does a dry run. It shows the file is going to be updated because there is a mismatch in the file size. Should the file size be the same, synctool will calculate an MD5 checksum to see whether the file was changed or not.
You may want to review your changes before applying them, or inspect the difference between the version in the repository with what’s currently installed on a node:
synctool -n node1 --diff /etc/ntp.conf
synctool -n node1 -d /etc/ntp.conf
This will present a UNIX ‘diff’ of the files. Note the destination path in the syntax of the command.
To apply the change, you could now run synctool with option --fix
.
But maybe it’s better to read on, we are going to have synctool automatically
reload the ntpd
after updating the ntp.conf
file.
3.2 Adding actions to updates
Now I would like the ntpd
to be automatically reloaded after I change
the ntp.conf
file. This is done by adding a trigger script, in
synctool-speak known as a “.post” script.
Make a new file overlay/all/etc/ntp.conf.post
and put only this line in it:
service ntp reload
Make the .post
script executable: chmod +x ntp.conf.post
.
The .post
script will be run when the file changes:
root@masternode:/# synctool -f
node1: /etc/ntp.conf updated (file size mismatch)
node1: running command $overlay/all/etc/ntp.conf.post
The .post
script is run after synctool updated the file, and likewise,
you may also create a .pre
script that runs before the update:
root@masternode:/# synctool -f
node1: running command $overlay/all/etc/ntp.conf.pre
node1: /etc/ntp.conf updated (file size mismatch)
node1: running command $overlay/all/etc/ntp.conf.post
The .pre
and .post
scripts are executed in the directory where the
accompanying file resides; in this case /etc/
. It is possible to add
a group extension to the script, so that you can have one group of nodes
perform different actions than another.
The scripts are run with sh -c
. Note that /bin/sh
is often not the
same as bash
, so some clever shell scripting tricks may not work. However,
you can fix this by including “#!/bin/bash
” in the top of the .post
script.
In the environment you will find two variables that might be useful:
SYNCTOOL_NODE
is set to the node that we’re running onSYNCTOOL_ROOT
is set to the directory where synctool lives
So expanding on that, $SYNCTOOL_ROOT/bin/
is the bindir, and the repository
is found under $SYNCTOOL_ROOT/var/overlay/
.
A .post
script for a directory will trigger when any file in that directory
changes. This is particularly useful for daemons that have multiple config
files in a directory, such as conf.d
, or, for example, /etc/cron.d
.
A .pre
script for a directory will only trigger if the directory does not
exist and will be created.
3.3 Other useful options
The option -q
of synctool gives less output:
root@masternode:/# synctool -q
node3: /etc/xinetd.d/identd updated (file size mismatch)
If -q
still gives too much output, because you have many nodes in your
cluster, it is possible to specify -a
to condense (aggregate) output.
The condensed output groups together output that is the same for many nodes.
One of my favorite commands is synctool -qa
.
You may also use option -a
to condense output from dsh
, for example
# dsh -a date
# dsh-ping -a
The option -f
or --fix
applies all changes. Always be sure to run
synctool at least once as a dry run! (without -f
).
Mind that synctool does not lock the repository and does not guard against
concurrent use by multiple sysadmins at once. In practice, this hardly ever
leads to any problems.
To update only a single file rather than all files, use the option
--single
or -1
(that’s a number one, not the letter ell).
You may give multiple --single
options to update multiple files at once.
If you want to check what file synctool is using for a given destination
file, use option -ref
or -r
:
root@masternode:/# synctool -q -n node1 -r /etc/resolv.conf
node1: /etc/resolv.conf._somegroup
synctool can be run on a subset of nodes, a group, or even on individual
nodes using the options --node
or -n
, --group
or -g
, --exclude
or -x
, and --exclude-group
or -X
. This also works for dsh
and friends,
and you may use the range syntax to select a range of nodes.
For example:
# synctool -g batch,sched -X rack8
More examples:
# dsh -n node1,node2,node3 date
# dsh -n node[1-3] date
# dsh -n node[01-10] -x node[05-07] hostname
# dsh -n node[02-10/2,05,07] hostname
Copy a file to three nodes:
# dsh-cp -n node[1-3] patchfile-1.0.tar.gz /tmp
After rebooting a cluster, use dsh-ping
to see if the nodes respond to ping
yet. You may also do this on a group of nodes:
# dsh-ping -g rack4
The option -v
gives verbose output. This is another way of displaying
the logic that synctool performs:
# synctool -v
node3: checking $overlay/all/etc/tcpd_banner.production._all
node3: overridden by $overlay/all/etc/tcpd_banner.production._batch
node3: checking $overlay/all/etc/issue.net.production._all
node3: checking $overlay/all/etc/syslog.conf._all
node3: checking $overlay/all/etc/issue.production._all
node3: checking $overlay/all/etc/modules.conf._all
node3: checking $overlay/all/etc/hosts.allow.production._interactive
node3: skipping $overlay/all/etc/hosts.allow.production._interactive,
it is not one of my groups
The option --unix
produces UNIX-style output. This shows in standard shell
syntax just what synctool is about to do.
root@masternode:/# synctool --unix
node3: # updating file /etc/xinetd.d/identd
node3: mv /etc/xinetd.d/identd /etc/xinetd.d/identd.saved
node3: umask 077
node3: cp /var/lib/synctool/overlay/etc/xinetd.d/identd._all
/etc/xinetd.d/identd
node3: chown root.root /etc/xinetd.d/identd
node3: chmod 0644 /etc/xinetd.d/identd
synctool does not apply changes by executing shell commands; all operations are programmed in Python. The option
--unix
is only a way of displaying what synctool does, and may be useful when debugging.
The option -T
option produces terse output. In terse mode, long paths are
abbreviated in an attempt to fit them on a single line of 80 characters wide.
Terse mode can be made to give colored output through synctool.conf
.
root@masternode# synctool -n n1 -T
n1: DRYRUN not doing any updates
n1: mkdir /Users/walter/src/.../testroot/etc/cron.daily
n1: new /Users/walter/src/.../testroot/etc/cron.daily/testfile
n1: exec //overlay/Users/.../testroot/etc/cron.daily.post
Note that these abbreviated paths can still be copy-and-pasted and used with
other synctool commands like --single
and --diff
. synctool will recognize
the abbreviated path and expand it on the fly. In the case of any name clashes
synctool will report this and present a list of possibilities for you to
consider.
The option --skip-rsync
skips the rsync
run that copies the repository
from the master to the client node. You may use this option when you are
absolutely certain that the master and client are already in sync, for example
if you just ran synctool to examine any changes. In general, this option is
unnecessary, but it may be efficient if you are working with slow network
links or a large synctool repository.
3.4 Templates
For ‘dynamic’ config files, synctool has a feature called templates.
There are a number of rather standard configuration files that (for example)
require the IP address of a node to be listed. These are not particularly
synctool friendly. You are free to upload each and every unique instance
of the config file in question into the repository, however, if your cluster
is large this does not make your repository look very nice, nor does it
make them any easier to handle. Instead, make a template and couple it with
a ._template.post
script that calls synctool-template
to generate the
config file on the node.
As an example, I will use a fictional snippet of config file, but this
trick applies to things like sshd_config
with a specific ListenAddress
in it, and network configuration files that have static IPs configured.
# fiction.conf._template
MyPort 22
MyIPAddress @IPADDR@
SomeOption no
PrintMotd yes
And the accompanying fiction.conf._template.post
script:
#! /bin/sh
IPADDR=`ifconfig en0 | awk '/inet / { print $2 }'`
export IPADDR
/opt/synctool/bin/synctool-template "$1" >"$2"
This example uses ifconfig
to get the IP address of the node. You may also
use the ip addr
command, consult DNS or you might be able to use
synctool-config
to get what you need.
The synctool-template
command takes as input the template file (“$1
”)
and redirects the output to a newly generated file (“$2
”). The “$2
”
on the last line expands to fiction.conf._nodename
.
Hence, synctool generates a new config file in the repository. It does so
even on dry runs; you can ask synctool to display a diff of fiction.conf
even though it is templated.
Note not to redirect the output of
synctool-template
directly over the target file. Doing that is destructive and wrong; it defies synctool’s dry-run mode and keeps you from being able to review changes, a core function of synctool.
Instead of using synctool-template
, you might use the UNIX sed
command.
If you have multiple variables to replace, synctool-template
is more easy.
synctool-template accepts variables either from the command-line or from
the shell environment. Like with regular .post
scripts, the environment
variables SYNCTOOL_NODE
and SYNCTOOL_ROOT
are also present here.
However unlike regular .post
scripts, template post scripts require a #!
hashbang line. This is required for shell arguments (like “$1
”, “$2
”)
to work.
Now, when you want to change the configuration, edit the template file. synctool will fill in the template and see the difference with the target file.
Template files and template post scripts can have group extensions to select different templates for certain groups of nodes.
If you want to automatically reload or restart a service after updating
fiction.conf
, you’ll also have to implement a regular .post
script for
that: fiction.conf.post
.
3.5 Purge directories
In the previous sections we saw how you can use the overlay/
and delete/
trees to manage your cluster. synctool has a third mechanism of syncing
files, and it works with the purge/
tree. Purge directories are great for
mirroring entire directory trees to groups of nodes.
Unlike with the overlay/
tree, files in the purge/
tree do not have group
extensions. Instead, synctool will copy the entire subtree and it will
delete any files on the target node that do not reside in the source tree.
So, it will make a perfect mirror of the source under purge/
.
To populate the purge/
tree, use --upload
with the --purge
option:
# synctool -n n1 --upload /usr/local --purge compute
# synctool -n n1 -u /usr/local -p compute
In this example, we want to upload the entire /usr/local
tree from node n1
to the repository directory /opt/synctool/var/purge/compute/
.
Afterwards, all compute nodes will get /usr/local
synced via the purge
mechanism by running synctool -f
.
Purging is a blunt but effective means to synchronise directory trees. Mind that it will delete data that is not supposed to be there, so be careful with this feature. For added safety, synctool will not allow you to purge the root directory of a system.
Under the hood, synctool employs rsync
to purge files. Hence, you can not
trigger actions through .post
scripts in the purge directory, but it is
possible to use synctool --diff
, --ref
, and even --single
with files
that reside under purge/
.
Remember that purging is for making perfect mirrors. It is like sharing a
directory across nodes. Once you start differentiating directory content
between nodes, “purge” will no longer work in a satisfying way; in such a
case, you should really use overlay/
rather than purge/
.
dsh-cp
also has an option --purge
to quickly mirror directories across
nodes. Use with care.
3.6 The order of operations
The previous sections described a lot of operations that synctool performs when it runs. This section summarises what we have seen so far. For a normal synctool run, the order of operations is roughly as follows.
- synchronise the synctool installdir to each node. This synchronises
the repository as well as the main program and config file.
Any subtrees under
overlay
,delete
, andpurge
that do not apply for the target node, are excluded. - run synctool-client on the nodes
- synctool-client mirrors the
purge
directory - synctool-client processes the
overlay
directory;- generate templates by running
.template.post
scripts - compare files
- check filetype
- check file size
- check MD5 checksum
- check file ownership
- check file mode
- make backup copies
- update files as needed
- run
.post script
for any updated files - run
.post script
(if any) for changed directories
- generate templates by running
- synctool-client deletes files listed in the
delete
directory- run
.post script
(if any) for deleted files - run
.post script
(if any) for changed directories
- run
3.7 dsh-pkg, the synctool package manager
synctool comes with a package manager named dsh-pkg
.
Rather than being yet another package manager with its own format of packages,
dsh-pkg is a wrapper around existing package management software.
dsh-pkg unifies all the different package managers out there so you can
operate any of them using just one command and the same set of command-line
arguments. This is particularly useful in heterogeneous clusters or when
you are working with multiple platforms or distributions.
dsh-pkg supports a number of different package management systems and will
detect the appropriate package manager for the operating system of the node.
If detection fails, you may force the package manager on the command-line or
in synctool.conf
:
#package_manager apk
package_manager apt-get
#package_manager brew
#package_manager bsdpkg
#package_manager dnf
#package_manager pacman
#package_manager pkg
#package_manager yum
#package_manager zypper
dsh-pkg knows about more platforms and package managers, but currently only the ones listed above are implemented and supported.
dsh-pkg is pluggable. Adding support for other package management systems is rather easy. If your platform and/or favorite package manager is not yet supported, feel free to develop your own plug-in for dsh-pkg or contact the author of synctool.
The pkg
module is for FreeBSD, use bsdpkg
on other BSD systems.
Following are examples of how to use synctool-pkg.
dsh-pkg -n node1 --list
dsh-pkg -n node1 --list wget
dsh-pkg -g batch --install lynx wget curl
dsh-pkg -g batch -x node3 --remove somepackage
Sometimes you need to refresh the contents of the local package database. You can do this with the ‘update’ command:
dsh-pkg -qa --update
You may check for software upgrades for the node with --upgrade
.
This will only show what upgrades are available. To really upgrade a node,
specify --fix
. It is wise to always test an upgrade on a single node.
dsh-pkg --upgrade
dsh-pkg -n testnode --upgrade -f
dsh-pkg --upgrade -f
Package managers download their packages into an on-disk cache. Sometimes the disk fills up and you may want to clean out the disk cache:
dsh-pkg -qa --clean
A specific package manager may be selected from the command-line.
dsh-pkg -m yum -i somepackage # force it to use yum
If you want to further examine what dsh-pkg is doing, you may specify
--verbose
or --unix
to display more information about what is going on
under the hood.
3.8 Ignoring them: I’m not touching you
By using directives in the synctool.conf
file, synctool can be told to
ignore certain files, nodes, or groups. These will be excluded, skipped.
For example:
ignore_dotfiles no
ignore_dotdirs yes
ignore .svn
ignore .gitignore .git
ignore .*.swp
synctool will not run on ignored nodes or on nodes that are in a group that is ignored:
ignore_node node1 node2
ignore_group broken
3.9 Backup copies
For any file synctool updates, it keeps a backup copy around on the target
node with the extension .saved
. If you don’t like this, you can tell
synctool to not make any backup copies with:
backup_copies no
It is however highly recommended that you run with backup_copies
enabled.
You can manually specify that you want to remove backup copies using:
synctool --erase-saved
synctool -e
To erase a single .saved
file, use option --single
in combination with
--erase-saved
.
For some (Linux) directories like /etc/cron.d/
and /etc/xinet.d/
, it is
not OK to keep .saved
files around because it influences how the daemons
function. For these directories it is recommended that you implement
a .post
script that removes the backup copies, like so:
# $overlay/all/etc/xinetd.d.post
rm -f *.saved
service xinetd reload
Alternatively, you may want to move the backup copies to a safe location.
3.10 Logging
When using option --fix
to apply changes, synctool logs the made changes
to syslog on the master node. It provides a trace of what was changed on the
systems. On large clusters, this may produce a lot of log records. If you
don’t want any logging, you can disable it in synctool.conf
:
syslogging no
When you do use syslogging, you may want to split off the synctool messages
to a separate file like /var/log/synctool.log
. Please see your syslogd
manual on how to do this. In the contrib/
directory in the synctool source,
you will find config files for use with syslog-ng
and logrotate
.
3.11 About symbolic links
synctool requires all files in the repository to have an extension (well … unless you changed the default configuration), and symbolic links must have extensions too. Symbolic links in the repository will be dead symlinks but they will point to the correct destination on the target node.
Consider the following example, where file
does not exist ‘as is’ in the
repository:
$overlay/all/etc/motd._red -> file
$overlay/all/etc/file._red
In the repository, motd._red
is a red & dead symlink to file
. On the
target node, /etc/motd
is going to be fine.
3.12 Slow updates
By default, synctool addresses the nodes in parallel, and they are running updates concurrently. In some cases you might not want to have any parallelism. There are two easy ways around this;
dsh --numproc=1 uptime
dsh -p 1 uptime
dsh --zzz=10 uptime
dsh -z 10 uptime
The first one tells synctool (or in this case, dsh
) to run only one
process at a time. The second does the same thing, and sleeps for ten seconds
after running the command.
Suppose you have a 60 nodes cluster, and run with
--zzz=60
. You now have to wait at least one hour for the run to complete.
The options --numproc
and --zzz
work for both synctool
and dsh
programs.
3.13 Checking for updates
synctool can check whether a new version of synctool itself is available by
using the option --check-update
on the master node. You can check
periodically for updates by using --check-update
in a crontab entry.
To download the latest version, run synctool --download
on the master node.
These functions connect to the main website at www.heiho.net/synctool.
3.14 Running tasks with synctool
synctool’s dsh
command is ideal for running commands on groups of nodes.
On occasion, you will also want to run custom scripts with dsh
.
These scripts can be placed in scripts/
, and dsh
will find them.
When running a command that resides under scripts/
, dsh
will sync this
script to the target node prior to running the command on the remote side.
This is done to make sure that always the ‘current’ version of the script
runs on the target node.
For example, if you have a script /opt/synctool/scripts/admin_example.sh
then you might run:
dsh -n node1 admin_example.sh
No path to the script is required; dsh will find it.
Old versions had a
tasks/
directory under the repository and you could invoke synctool with the--tasks
option. This mechanism has been obsoleted bydsh
and thescripts/
directory.
Note that you can write scripts to do software package installations,
but you may also use the dsh-pkg
command.
3.15 Multiplexed connections
synctool and dsh can multiplex SSH connections over a ‘master’ connection. This feature greatly speeds up synctool and dsh because it allows skipping the costly SSL handshake. Multiplexing is started through dsh:
dsh -M # start master connections
dsh -O check # check master connections
dsh -O stop # stop master connections
dsh -O exit # terminate master connections
You may also do this for certain groups or nodes, like so:
dsh -g all -M
dsh -n node1 -O check
synctool will detect any open control paths and use them if they are present.
The control paths (socket files) to each node are kept under synctool’s temp
directory (by default: /tmp/synctool/sshmux/
).
These control paths are managed by ssh mux processes that are running in the
background. If your cluster is very large, you might find the large number of
ssh mux processes on the management node to be objectionable. These processes
are mostly sleeping so it shouldn’t pose a problem.
The control paths may be given a timeout by using the config parameter
ssh_control_persist
. Note that this parameter is only supported for
OpenSSH 5.6 and later. The timeout may also be specified on the command-line:
dsh -M --persist 4h
The
ControlMaster
andControlPath
options of ssh first appeared in OpenSSH version 3.9. synctool also supportsControlPersist
, which is present in OpenSSH version 5.6 and later. Seeman ssh_config
for more information on these OpenSSH options.
4. All configuration parameters explained
This chapter lists and explains all parameters that you can use in synctool’s configuration file.
backup_copies <yes/no>
When set to ‘yes’, synctool creates backup copies on the target nodes of files that it updates. These backup files will be named
*.saved
. The default for this parameter isyes
.colorize <yes/no>
In terse mode, synctool output can be made to show colors. Mind that this parameter only works when
terse
is set toyes
. The default isyes
.color_info <color>
Specify the color for informational messages in terse mode. The default is
default
.Following are keywords to customize colors. Valid color codes are:
white red blue default black green magenta bold darkgray yellow cyan
color_warn <color>
Specify the color for warnings in terse mode. The default is
magenta
.color_error <color>
Specify the color for error messages in terse mode. The default is
red
.color_fail <color>
Specify the color for failure messages in terse mode. The default is
red
.color_sync <color>
Specify the color for sync messages in terse mode. These occur when synctool synchronizes file data. The default is
default
.color_link <color>
Specify the color for link messages in terse mode. These occur when synctool creates or repairs a symbolic link. The default is
cyan
.color_mkdir <color>
Specify the color for mkdir messages in terse mode. These occur when synctool creates a driectory. The default is
default
.color_rm <color>
Specify the color for rm messages in terse mode. These occur when synctool deletes a file. The default is
yellow
.color_chown <color>
Specify the color for chown messages in terse mode. These occur when synctool changes the ownership of a file or directory. The default is
cyan
.color_chmod <color>
Specify the color for chmod messages in terse mode. These occur when synctool changes the access mode of a file or directory. The default is
cyan
.color_exec <color>
Specify the color for exec messages in terse mode. These occur when synctool executes a
.post
script. The default isgreen
.color_upload <color>
Specify the color for upload messages in terse mode. These occur when you use synctool to upload a file. The default is
magenta
.color_new <color>
Specify the color for new messages in terse mode. These occur when a sync operation requires synctool to create a new file. The default is
default
.color_type <color>
Specify the color for type messages in terse mode. These occur when the file type of the entry in the repository does not match the file type of the target file. The default is
magenta
.color_dryrun <color>
Specify the color for the ‘DRYRUN’ message in terse mode. It occurs when synctool performs a dry run. The default is
default
.color_fixing <color>
Specify the color for the ‘FIXING’ message in terse mode. It occurs when synctool is run with the
--fix
option. The default isdefault
.color_ok <color>
Specify the color for the ‘OK’ message in terse mode. It occurs when files are up to date. The default is
default
.colorize_bright <yes/no>
In terse mode, synctool output can be made to show colors. This option enables the bright/bold attribute for colors. Mind that this parameter only works when both
terse
andcolorize
are enabled. The default isyes
.colorize_bold <yes/no>
Same as
colorize_bright
.colorize_full_line <yes/no>
In terse mode, synctool output can be made to show colors. This option colors the full output line rather than just the leading keyword. Mind that this parameter only works when both
terse
andcolorize
are enabled. The default isno
.default_nodeset <group-or-node> [..]
By default, synctool will run on these nodes or groups. You can use this to make synctool by default work on only a subcluster rather than your whole hardware installation.
The default is
all
. You may set it tonone
to make synctool not run on a default set of nodes at all. Example:default_nodeset test1 test2 testnodes xtest[1-10]
diff_cmd <diff UNIX command>
Give the command and arguments to execute
diff
. synctool runs this command when you use the option--diff
to see the differences between the file in the repository and on the target node.The exact location and arguments of the
diff
program are operating system specific; the PATH environment variable will be searched for the command if you do not supply a full path.The default is:
diff -u
full_path <yes/no>
synctool likes to abbreviate paths to
$overlay/some/dir/file
. When you set this option tono
, synctool will display the true full path instead of the abbreviated one.The default is
no
.group <groupname> <subgroup> [..]
The
group
keyword defines compound groups. It is a means to group several subgroups together into a single group. If the subgroups did not exist yet, they are defined automatically as new, empty groups.group wn workernode batch group test wn group g1 batch test wn
Group names are alphanumeric, but can have an underscore, minus, or plus symbol in between. The following are valid group names:
group group1 group-1 group_1 group+1 192_168_1 10 node1+node2 group A+B+C A B C
ignore <filename or directory name> [..]
This parameter enables you to have synctool ignore specific files or directories in the repository. Multiple
ignore
definitions are allowed. You may put multiple filenames on a line, and you may use wildcards. Example:ignore .gitignore .git .svn ignore .*.swp ignore tmp[0-9][0-9][0-9]??
ignore_dotfiles <yes/no>
Setting this to ‘yes’ results in synctool ignoring all files in the repository that begin with a dot. This can be convenient like for example for
.gitignore
. The default isno
; do not ignore dotfiles.ignore_dotdirs <yes/no>
Setting this to ‘yes’ results in synctool ignoring all directories in the repository that begin with a dot. This can be convenient like for example for
.svn/
directories. The default isno
; do not ignore dotdirs.ignore_group <group> [..]
This tells synctool to ignore one or more groups. You can use this to temporarily disable the group. This can be particularly useful when doing software upgrades.
ignore_node <nodename> [..]
This tells synctool to ignore one or more nodes. You can use this if you want to skip this node for a while for some reason, for example because it is broken.
include <(local) synctool config file>
This keyword includes a synctool configuration file (that is possibly located on the target node). You can use this to give certain nodes a slightly different synctool configuration than others. This can be important, especially in setups where you are running a multitude of operating systems.
include /etc/synctool_local.conf
Another good use of this option is to clean up your configuration:
include /opt/synctool/etc/nodes.conf include /opt/synctool/etc/colors.conf
master <fqdn>
Indicates which node is the master, the management node from where you will run synctool to control the cluster. It should be set to the fully qualified domain name of the management host. You can get the fqdn with:
synctool-config --fqdn
node <nodename> <group> [..] [ipaddress:<name, IP address, or sequence>] [rsync:<yes/no>]
The
node
keyword defines what groups a node is in. Multiple groups may be given. The order of the groups is important; the left-most group is most important, and the right-most group is least important. What this means is that if there are files in repository that have the same base filename, but a different group extension, synctool will pick the file that has the most important group extension for this node.Groups can be defined ‘on the fly’, there is no need for a group to exist before it can be used in a node definition.
Node names are alphanumeric, but can have an underscore, minus, or plus symbol in between. The following are valid node names:
node node1 node-1 node_1 node+1 10_0_0_2 10 node1+node2
The
ipaddress
specifier tells synctool how to contact the node. This is optional; when omitted, synctool assumes the nodename can be found in DNS. Note that synctool nodenames need not be same as DNS names.The synctool master tells the clients what node they are. When running synctool-client in stand-alone mode, the client tries to figure out by itself what node it is running on. If it fails to determine the nodename, you may pass option
--nodename
to force it.The optional
rsync:no
specifier may be used to tell synctool not to sync the repository to the target node. This is only convenient when the node has access to the repository via another way, such as a shared filesystem.node node1 fs sched rack1 node node2 login rack1 node node3 test rack1 ipaddress:node8-mgmt node node4 batch rack1 ipaddress:node9-mgmt rsync:no node node5 wn rack1 ipaddress:node5-mgmt node node6 wn rack1 ipaddress:node6-mgmt node node7 wn rack1 ipaddress:node7-mgmt node node[20-29] wn rack2 ipaddress:node[20]-mgmt node node[30-39] wn rack3 ipaddress:192.168.3.[130]
As shown in this example, a node range may be given to define a number of nodes using a single definition line. The (optional) IP address may use the sequence notation, that numbers the IP addresses in sequence.
num_proc <number>
This specifies the maximum amount of parallel processes that synctool will use. For large clusters, you will want to increase this value, but mind that this will increase the load on your master node. Setting this value higher than the amount of nodes you have, has no effect. The default is
16
.For synctool, dsh, dsh-pkg and the like, option
--numproc
can be given to override this setting.package_manager <package management system>
Specify the package management system that dsh-pkg must use. If left out, dsh-pkg will detect what package manager it should use, but using this parameter you can force it if detection fails. This setting can be overridden from the command-line when invoking
dsh-pkg
.Valid values for
package_manager
are:apk apt-get brew bsdpkg dnf pacman pkg yum zypper
pkg_cmd <synctool-client-pkg UNIX command>
Give the command and arguments to execute
synctool-client-pkg
. dsh-pkg uses this command to do package management on the target nodes.The exact location of the
synctool-client-pkg
program is installation dependent. However, dsh-pkg looks for it under the synctool root.The default is:
$SYNCTOOL/bin/synctool-client-pkg
ping_cmd <ping UNIX command>
Give the command and arguments to execute
ping
.dsh-ping
uses this command to check what nodes are up and running.The exact location and arguments of the
ping
program are operating system specific; the PATH environment variable will be searched for the command if you do not supply a full path.The default is:
ping -q -c 1 -w 1
(which assumes Linux ping options)require_extension <yes/no>
When set to ‘yes’, a generic file in the repository must have the extension
'.all'
. When set to ‘no’ an extension is not required and the groupall
is automatically implied. The default isyes
.rsync_cmd <rsync UNIX command>
Give the command and arguments to execute
rsync
. synctool uses this command to distribute the repository to the target nodes.The exact location of the
rsync
program is operating system specific; the PATH environment variable will be searched for the command if you do not supply a full path.The default is:
rsync -ar --delete --delete-excluded -e 'ssh -o ConnectTimeout=10 -x -q' -q
synctool will automatically add another option
--filter
to this command, which it uses to ensure that the correct overlay directories are synced to the nodes. So be mindful that you can tweak the parameters of thersync
command, but you can not replace it with a different copying program — unless it also supportsrsync
’s filtering capabilities.slave <nodename> [..]
Slave nodes get a full copy of the synctool repository. Slaves have no other function than that. You can not run synctool from a slave until you change it into a master node in the config file.
ssh_cmd <ssh UNIX command>
Give the command and arguments to execute
ssh
. synctool anddsh
use this command to execute remote commands on the target nodes.The exact location of the
ssh
program is operating system specific; the PATH environment variable will be searched for the command if you do not supply a full path.The default is:
ssh -o ConnectTimeout=10 -x -q
ssh_control_persist <time|yes|none>
This parameter maps directly to the OpenSSH
ControlPersist
. It sets the default timeout for multiplexed connections, usingdsh -M
. This parameter may be overridden on the command-line withdsh --persist
. The time argument is a string like “1h” or “1h30m”. When it is “yes”, there is no timeout, and it will persist indefinitely until the master is stopped or terminated withdsh -O stop
ordsh -O exit
. Note that OpenSSH supportsControlPersist=no
, but synctool does not. It can be set tonone
to callssh
without-o ControlPersist
option. The default timeout is 1 hour. This parameter only has effect for OpenSSH version 5.6 and later.sync_times <yes/no>
Synchronize modification timestamps of files. Every file on the node will get the timestamp identical to the file in the overlay. Template generated files will get the timestamp of the template.
This setting only works for files (and special files). Timestamps of symbolic links and directories are not managed.
The default is:
no
.synctool_cmd <synctool UNIX command>
Give the command and arguments to execute
synctool-client
. synctool uses this command to execute synctool on the target nodes.The exact location of the
synctool-client
program is installation dependent. However, synctool looks for it under the synctool root.The default is:
$SYNCTOOL/bin/synctool-client
syslogging <yes/no>
Log any updates to syslog. Nothing is logged for dry runs. The default is:
yes
.tempdir <directory>
Directory where the synctool master will create temporary files. The default is
/tmp/synctool
. There may be only onetempdir
defined and it must be an absolute path.terse <yes/no>
In terse mode, synctool shows a very brief output with paths abbreviated to
//overlay/dir/.../file
. The default isno
.
5. Best practices
This chapter contains tips, tricks and examples of how to make good use of synctool.
5.1 Use logical group names
synctool allows you to use nodenames as extension on files in the repository. This is nice because it allows you to easily differentiate for a single node. However, it is much better practice to put that node in a certain group, and let it be the only member of that group. Now label the file as belonging to that group, and you’re done.
Bad:
overlay/all/etc/hosts.allow._all
overlay/all/etc/hosts.allow._node1
overlay/all/etc/motd._all
overlay/all/etc/motd._node1
Good:
overlay/mycluster/etc/hosts.allow._all
overlay/mycluster/etc/hosts.allow._login
overlay/mycluster/etc/motd._all
overlay/mycluster/etc/motd._login
The advantage is that you will be able to shift the service to another node simply by changing the synctool configuration, rather than having to rename all files in the repository.
5.2 Future planning: Be mindful of group ‘all’
synctool has this great group ‘all’ that applies to all nodes. One day,
however, you decide to add a new machine to your cluster that has a pretty
different role than any other. (Not) suprisingly, synctool will want to apply
all files tagged as all
to the new node — but in this case, it’s exactly
not what you want.
The problem is that all
is too generic, and the solution is to rename
overlay/all/
to something else, such as overlay/common/
, or better yet,
overlay/subcluster1/
. This moves all
out of the way and you can integrate
your new node in a better way.
The lesson here is that overlay/all/
is a nice catch-all directory, but
it’s maybe best left unused. It’s perfectly OK for files to be tagged as
._all
, but they really should be placed in a group-specific overlay
directory.
5.3 Use group extensions on directories sparingly
synctool allows you to add a group extension to a directory, like so:
$overlay/all/etc._mygroup/
This is a powerful feature. However, it can make a mess of your repository
as well. If you catch yourself having to use find
all the time to pinpoint
the location of a file in the repository, chances are that you making things
harder on yourself than ought to be.
Maybe it is better structured as:
$overlay/mygroup/etc/
but maybe it is better structured as:
$overlay/all/etc/somefile._all
$overlay/all/etc/somefile._mygroup
The main message is “keep it simple”. Try not to use too many group directories, because it makes things complicated.
5.4 Do not manage the master node
It is recommended that you do not manage the master node with synctool. The reason is that it makes things more complicated when you choose to put the configuration of your master node under control of synctool.
Why is this? It is mainly because synctool by default works on ‘all’ nodes,
and for some reason it is unnatural when ‘all’ includes the master node too.
Imagine calling dsh reboot
to reboot all nodes, and pulling down the master
with you (in the middle of the process, so you may not even succeed at
rebooting all nodes).
It often means doing double work; the config files of the master node tend to differ from the ones on your nodes. It is good practice though to label the master node as master, and to simply ignore it:
node n01 master
ignore_group master
It’s also OK to leave the master node out of the configuration altogether.
(Here, ‘master’ is a group, not to be confused with the master
keyword that
defines the master node. Are you still with me?)
If you still want to manage the master with synctool, do as you must. Just be
sure to call dsh -X master reboot
when you want to reboot your cluster.
5.5 Managing multiple clusters with one synctool
It is really easy to manage just one cluster with one synctool. This means that your repository will contain files that apply only to that cluster. In reality, you may have only one management host for managing multiple clusters, or you may have several subclusters in a single cluster.
Managing such a setup with synctool used to be hard & hackish, but since version 6 it is only a matter of using groups in the right way.
Add all clusters to the synctool repository as you would with adding more
nodes. Create a group for each (sub)cluster. For each clustergroup, add a
directory overlay/cluster/
, so you can handle them independently whenever
you wish to. Your repository will look like this:
$overlay/all/
$overlay/cluster1/
$overlay/cluster2/
$overlay/cluster3/
If you tend to reinstall systems a lot with different operating systems, it may be a good idea to create per OS groups. This also helps when upgrading to new OS releases.
$overlay/wheezy/
$overlay/centos64/
$overlay/sles11sp2/
Note that files under these directories can still be marked ._all
, synctool
will select the correct file as long as you tag the node with the right group.
Decide for yourself what files should go under what directory, and what layout works best for you.
5.6 Use a tiered setup for large clusters
If you have a large cluster consisting of hundreds or thousands (or maybe
more) nodes, you will run into a scaling issue at the master node.
synctool doesn’t care how many nodes you manage with it, but doing a
synctool run to a large number of nodes will take considerable time. You can
increase the num_proc
parameter to have synctool do more work in parallel,
but this will put more load on your master node. To manage such a large
cluster with synctool, a different approach is needed.
The solution is to make a tiered setup. In such a setup, the master node syncs to other master nodes (for example, the first node in a rack), which in turn sync to subsets of nodes. Script it along these lines:
#! /bin/bash
for rack in $RACKS
do
# give rackmasters a full copy of the repos
rsync -a --delete /opt/synctool/ ${rack}-n1:/opt/synctool/
# run synctool on rackmaster
dsh -n ${rack}-n1 --no-nodename \
/opt/synctool/bin/synctool -g $rack "$@"
done &
wait
So, the master node syncs to ‘rack masters’, and the rack masters in turn
run synctool on their subset of nodes. In the config, the nodes are
grouped by rack. The option --no-nodename
is used with dsh
to make the
output come out right.
You also still need to manage the rack masters — with synctool, from the
master node.
A slightly different solution is to make use of slave nodes; the master syncs full copies only to slaves; next, the slaves manage the nodes. This requires having multiple config files (eg, one per rack) and scripting it so that it uses the correct config file for each rack.
synctool -c slaves.conf "$@"
for rack in $RACKS
do
dsh -n ${rack}-n1 --no-nodename \
synctool -c confs/${rack}.conf "$@"
done
This tip is mentioned here mostly for completeness; I recommend running with a setup like this only if you are truly experiencing problems due to the scale of your cluster. There are security implications to consider when giving nodes a full copy of the repository. It depends on your situation whether it is acceptable to run like this.
5.7 Manage hosts behind a gateway
As synctool relies on SSH authentication you can easily manage hosts that are not directly available. Imagine this setup:
synctool-master-node
|
Internet
|
gateway.your.domain
|
+ privatenode1
+ privatenode2
You need to set up your ssh connection as follows in /root/.ssh/config
:
Host *.intra.your.domain
ProxyCommand ssh gateway.your.domain -W %h:22
Add your hosts in synctool.conf
:
node privatenode1 group1 group2 ipaddress:privatenode1.intra.your.domain
node privatenode2 group1 group2 ipaddress:privatenode2.intra.your.domain
Of course this requires also a proper DNS setup for your intra zone
at gateway.your.domain
6. Thank you
A big thanks goes out to all synctool users. You know who you are (!) Special thanks go to those whom I’ve had e-mail conversations with about synctool, and those at github who have made good suggestions for improving synctool. And last but not least, very special thanks go to those who contributed to synctool in one way or another. synctool would not be what it is today without your contributions.
If you also want to contribute to synctool, drop me an e-mail. If you are a
programmer who likes to contribute to the development of synctool, please
fork the github project and issue pull requests. If you don’t know how
git
works, patches in diff -u
format are always welcome too. However,
I highly recommend git
and github.