Rsnapshot

From ConShell
Jump to: navigation, search

Introduction

rsnapshot is a filesystem snapshot utility for making backups of local and remote systems.

http://www.rsnapshot.org/

Schedules

A very simple schedule is the 2-4 schedule, which yields full backup sets for today, yesterday and every week for a month.

For that, rsnapshot.conf should contain...

daily 2
weekly 4

To be clear, it would not contain any hourly or monthly or yearly lines

Cronjobs

cat /etc/cron.d/rsnapshot
0       23       *       *       *   root    /usr/bin/rsnapshot daily
30      22       *       *       6   root    /usr/bin/rsnapshot weekly

Always run the less frequent set (weekly) 30+ minutes prior to the more frequent daily set. This rule applies to all sets.

Another cronjob that is useful is to notify someone when errors occur.

#This sends a copy of any errors that occurred via e-mail to root
#Useful to identify when things have broken
MAILTO=root@example.com
21      6       *       *       *       root    cat /var/log/rsnapshot.log | grep ERROR

Another (probably better) way to handle errors from cron is to set verbose 2 then any batch run will be quiet unless there are problems.

Exclusions

To exclude data (files, folders) simply determine the path RELATIVE to the base folders specified in the rsnapshot.conf

There are multiple ways to specify excludes.

You can do them globally with a line like so in rsnapshot.conf

 exclude         .cache/
 exclude         Trash/
 exclude         "username/VirtualBox VMs/"

You may also put all your exclude paths in one file and point to it with

 exclude_file /path/to/exclude_file

Then for each backup point you may do one or more specific exclusions

 backup  /usr/local/     localhost/  exclude=tmp

If you need more than one exclude there, the subsequent ones are to include the -- as in

 backup  /usr/local/     localhost/  exclude=tmp --exclude=var/cache/

Common Exclusions

Thumbs.db - image cache files created & used by Windows OS. Should be safe to exclude and or remove. See here and here

.AppleDouble/ - resource and data forks from Mac OS X. See here

._* - metadata files (resource fork) created and used by Mac OS X. Should be safe to exclude and or remove. See here

.DS_Store - metadata files (resource fork) created and used by Mac OS X Finder. One per directory. See https://en.wikipedia.org/wiki/.DS_Store

Temporary Items/ and .TemporaryItems/ - scratch folders on Mac OS X. See https://discussions.apple.com/message/21885923#21885923 Specify as

exclude<tab>Temporary?Items/
exclude<tab>.TemporaryItems/

Compression

Backup disk full? Rsnapshot isn't very compatible with compressed backups because it destroys the efficiency of the hard-links used in the archive sets. But there are a couple of work-arounds.

The first is that SOME of the files in the (2nd & beyond) archive sets can be compressed, those that aren't in a linked set.

find daily.1 -links 1 -size +1M ! -name "*.bz2" -print | grep -v \.svn  | xargs pbzip2 -v

Revert

find daily.1 -links 1 -name "*.bz2" -print | xargs pbzip2 -d -v

Only run that against the 2nd archive set (e.g hourly.1, daily.1 or weekly.1) of the shortest in your configuration.

Another approach is to use a filesystem that supports native/transparent compression like zfs or btrfs


Q & A

How much data can rsnapshot backup and for how long?

A. This can be calculated as the size of the original backup set + the "churn", where churn is the size of all the files added or removed during the range from newest to oldest backup set. If the original+churn is larger than the total storage available for the backup, you will run out of disk space.

How can I get the most usage out of my available disk space?

A. Follow the best practices (below). Reduce the size of the original set i.e. by using excludes. Use the compression trick described above (only compress the 2nd set of the shortest interval), or use a filesystem that supports transparent compression (Hint: btrfs and zfs are two)

How can I see how much disk space is used by each backup set?

A. You might use different approaches here, but my favorite is du

$ sudo du -sch {hour,dai,week,month,year}ly.*
160G	daily.0
5.3G	daily.1
2.2G	daily.2
1.2G	weekly.0
...

Note that simply using du -sh * will generate a misleading report, since monthly.* will be parsed & appear before weekly.*. And if you use hourly backups, those would appear after daily. In other words, you need to match the globs up with the chronology of events.

What are some best practices for using Rsnapshot

A. First of all, when backing-up logfiles you should make sure your logrotate.d/* configurations specify 'dateext' and 'compress' options so that rotated logfiles have the date stamp and are precompressed before rsnapshot gets to them. This saves rsnapshot from saving multiple copies of the same data since the filenames won't be changing (0 to 1, 1 to2 etc). Another way is to use exclude=foo options to minimize excessive backups.

Rsnapshot seems to fail in mysterious ways...what are some ways to mitigate these problems?

Such as like old backup sets hanging around, configuration pickiness (space for tab problem) and other types of nonsense.

A. There are a number of things that can be done. First of all, having a configuration checker run (via cron) one or more times per day can help prevent a broken configuration from going unnoticed for very long. Examining the contents of the daily.0 (or hourly.0 if you go hourly) using ls -al daily.0/ can help to expose stale folder structures (perhaps from a host that no longer exists). Finally, bump the verbosity on the logging and make sure the output from the rsnapshot cronjobs is getting sent to you via e-mail. Review this regularly for any sign of problems. Alternatively, see the cronjob example above for a command that will weed out the errors from the log.

How can I prevent stale / orphaned data from leeching my disk space?

A. Over time (as hosts come and go) you may find stale data is left in daily.0 never getting purged out as it should. The way I deal with this is to write cronjob in /etc/cron.daily/breadcrumb on each system you back up. Have it touch a .crumb file in the root of each normal partition which you backup. This should be complemented with a cronjob on the Rsnapshot server which finds the stale breadcrumbs and thus notifies you. Example...

sudo find /path/to/backup/daily.0 -maxdepth 4 -name .crumb -mtime +1 -ls

How can I preserve hard-links in my backups?

A. By default the rsync_short_args only uses -a which does not include -H preserve hard links. So, specify like so...

rsync_short_args	-aH

Can I skip hourly, weekly, etc?

A. It is safe to skip certain end sets like hourly and yearly in your rotation . You could even skip hourly AND daily just starting with weekly. It does not seem possible to skip intermediates like weekly. In other words, you can't do hourly -> weekly -> yearly because rsnapshot expects to create weekly.0 from the oldest daily.X, and yearly.0 from the oldest monthly.X