How to make backups with Linux and Rsync?

As a web host, we needed an automated mechanism for generating snapshots of server filesystems on the Linux-based systems. There are a number of ways to achieve backups on Linux systems including Remote backup using Linux tar/ssh/cron and incremental tar backups on a local file system. One of the drawbacks of using tar to perform an entire filesystem backup is that some systems do not have the ability to create a compressed tarball greater than 2GB in size.

Rsync offers a reliable mechanism for synchronizing files and directories from one location to another while minimizing data transfer by only transferring deltas. Rsync is included in most Linux distributions, and installation is very easy. Properly configured rsync that performs system backups can protect against hard disk failures and system compromises.

2. What is Rsync?

Rsync is a little Linux utility that synchronizes filesystems from one place to another by only copying diffs (deltas) of files that have changed. Rsync optionally compresses the files on the fly before transfer (to save transfer time) and may be used in conjunction with rsh or ssh to perform remote file transfers. Rsync may be used as a backup or mirroring utility.

The advantage of using rsync over other archive and copy utilities such as tar, dump and rcp is that rsync (1) can use ssh as a secure channel to transfer files over the network, (2) provides an ability to retain ownership and permission of files being transferred, (3) enables files and directories to be synchronized (deleted files are deleted from the last replication), and (4) transfers only "delta" files that are changed from last replication making transfer much faster. If Rysnc is used without the ssh, it uses the TCP port 873.

3. How does Rsync work?

Rsync can be used in standalone or client/server mode, with client/server mode a little more common.

In a standalone mode, you may use rsync to copy files and directories by running the rsync command on the command line. This is useful when replicating files and directories on the same machine, or replicating between two machines using rsh/ssh channel. By using ssh, you're using TCP port 22 instead of TCP port 873 (rsync). To use ssh without supplying a password (in automated backup), you're required to set up a trusted environment between the two machines by generating a private/public pair of keys and installing them on the machines. Instructions on setting up the private/public key pairs are described in Setting up trusted ssh environment with public/private key pair article.

In a client/server mode, one machine becomes a "Rsync Server" by running the rsync in a daemon mode, and one or more client machine(s) may then synchronize the files to and from the server. Setting up a rsync server requires customizing a rsync configuration file, which resides in /etc/rsyncd.conf (or similar location). Running rsync in client/server mode does not require rsh/ssh transport channel, and hence uses the TCP port 873 designated for rsync protocol.

4. Running the Rsync in a standalone mode?

If you intend to replicate a filesystem on a local machine or use rsh/ssh as the secure channel to transfer files from one machine to another, you can use Rsync in standalone mode.

To copy files from one directory structure to another, you may simply run the rsync command. The -a switch retains the owner and permission information of the files being copied. This must be executed by the 'root' user in order to change user and permission data.

bash# rsync -a source destination

The command above is similar to "cp -r from to/, where {to} directory must already exist. Similarly, replicating the filesystem from one machine to another may be done by running:

bash# rsync -a -e ssh source username@remote_host:/path/to/destination

It should be noted that rsync does care about trailing slash in the source argument. If a trailing slash ("/") is supplied in the source argument, the contents of the directory are copied whereas if no trailing slash ("/") is supplied, the entire directory is copied. The trailing slash in the destination has no significance as it is always expected as a directory.

For example, "rsync -a a b" copies directory a inside the b, and hence the files are copied to the b/a/ directory. If, however, "rsync -a a/ b" is used, the files are stored in b/ directory without the directory a.

5. Running the Rsync in a client/server mode?

To use rsync in client/server mode, we must set up a Rsync Server. Setting a rsync server involves two steps (A) customizing /etc/rsyncd.conf configuration file, and (B) running the rsync command in daemon mode.

A. Configuring /etc/rsyncd.conf configuration file.

The Rsync configuration file looks very similar to the Samba configuration file as the rsync is co-authored by Andrew Tridgell, an author of Simba. The detailed description of rsyncd.conf can be found on the Linux manpage. An example of an rsync configuration file may look something like this:

motd file = /etc/rsyncd.motd
log file = /var/log/rsyncd.log
secrets file = /etc/rsyncd.screts

[target]
   path = /home
   comment = User home directories
   uid = nobody
   gid = nobody
   auth users = scott, michael
   host allow = 192.168.0.0/24
   host deny = *
   list = false

Important: It should be noted that Rsync will NOT grant access to a protected share if the secret (password) file noted above (/etc/rsyncd.secrets) is world-readable.

In the configuration settings worth noting above include "target", a name used to refer to a particular rsync target. In a target block, a number of configuration options may be defined. The "path" option specifies the files/directories to be rsync'ed, and "auth users" restrict access to pre-defined users that are specified in the secrets file. The "uid" and "gid" are user/group pairs that will be running the rsync backup. "auth users" need not be system users. "host allow" and "host deny" restrict hosts that can transfer files to/from the server. It is strongly advised that "host allow" and "host deny" options be setup as without those options, the target is world-readable.

We need to create a secrets file, /etc/rsyncd.secrets, with the contents:

scott:helloworld

The secrets file above contains a user, "scott", with a password "helloworld". Since the password is stored in plain text, the file must be owned by the root, and readable only by the root (permission 400 or 600). Otherwise, the rsync will simply not start at all.

B. Running rsync daemon

You may launch the rsync daemon in one of two methods: via the xinetd or as a standalone. When run from the inetd, the following two files need to be edited.

bash# nano /etc/services
...
rsync 873/tcp
...

bash# nano /etc/xinetd.d/rsync
service rsync
{
        disable = no
        socket_type     = stream
        wait            = no
        user            = root
        server          = /usr/bin/rsync
        server_args     = --daemon
       log_on_failure  += USERID
}

bash# service xinetd restart

The example above allows rsync to be run via the xinetd daemon. To restart rsync daemon, you may restart the xinet daemon.

Alternatively, you may run rsync in a daemon mode from the command line.

bash# rsync --daemon

Once we have the rsync server setup, we can run the rsync client from a client machine. To run a particular target defined in the /etc/rsyncd.conf configuration file, you will run rsync in the following manner:

bash# rsync -a scott@rsync_server::target /opt/rsync/backup

Password: ******

Notice that we do NOT specify a source path in the command above, but instead, a target name ("target") is specified after :: separator. The rsync configuration file describes the target with access control in detail. Enter the password defined in the secrets file.

FAQ: How to you bypass password prompt?

If you wish to automate rsync with cron, you must bypass the password prompt. If you're running rsync on TCP port 873, you may use the RSYNC_PASSWORD environment variable. Just write a simple bash script that sets the RSYNC_PASSWORD variable just before invoking the rsync command as shown below. When you're supplying a clear-text password in a file, it's important to protect your file with a permission mode of (chmod 600) so that no one except for you (a root) can see it.

#!/bin/bash
RSYNC="/usr/bin/rsync -a --delete"
export RSYNC_PASSWORD=helloworld

$RSYNC scott@192.168.0.2::target /path/to/local/filesystem

If you're using an SSH channel, you'll have to set up a trusted environment with a public/private key pair. To learn how to set up trusted ssh environment, please review Setting up trusted ssh environment.

6. Some useful command-line options

--delete When rsync is used to replicate one filesystem to another, the --delete option can be used to delete the file in destination filesystem if the source filesystem file is deleted. Otherwise, the deleted file will continue to reside in the destination filesystem. The default behavior of rsync keeps the deleted copy in the destination filesystem. Some of the rsync examples can be found in http://rsync.samba.org/examples.html.

Share this post

Comments (0)

    No comment

Leave a comment

All comments are moderated. Spammy and bot submitted comments are deleted. Please submit the comments that are helpful to others, and we'll approve your comments. A comment that includes outbound link will only be approved if the content is relevant to the topic, and has some value to our readers.


Login To Post Comment