I keep most of my data in the cloud, but there are various files on my local workstation that need to be backed up. For this, I use rdiff-backup and s3 with s3fs. This setup gives me a mirror of my current files in the cloud and a history of changes made for as long as I want to keep them.
If you don’t want to use S3 or s3fs, you can skip to the rdiff-backup section, as that will work with any file system path, whether or not it is mounted to an s3 bucket.
Creating an AWS account and S3 bucket is out of the scope of this how-to. Suffice to say, you’ll need an S3 bucket along with an access key and secret that can read and write to it.
On your Linux system, you’ll need to install the AWS cli tool: https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2-linux.html
Once installed, run
aws configure to provide your access key and secret.
I’ve noticed that things work much better if your cli’s default region (I use us-east-1) is where your s3 bucket lives. When these are different, some tools have issues, s3fs included.
I’ve also found that s3 allows you to put periods in your bucket names, but this also confuses many s3 clients, including s3fs. There are work-arounds, but if you can choose your s3 bucket name, don’t put a period in it.
rdiff-backup is a well-known python script that provides a mirror backup and
stores changes over time. By mirror backup, I mean that the backup destination
is a fully accessible copy of the files you’re backing up. A drawback of this
is that the full backup is not compressed, but the benefit is super easy access
to your most recent backup files.
rdiff-backup is also somewhat unique in that it stores incremental backup
data separately. Rather than starting with a full backup, which gets older
over time, and storing newer incomplete incremental backups on top of that over
time, it appends historical changes in a sort of journal. I prefer this method
because you always have a single full backup you can easily browse and restore
from, while changes over time can grow as long as you want them to. You can
truncate them whenever you want with a simple command without impacting your
current full backup.
On Arch, just run
pacman -S rdiff-backup to install it. I gather it’s
available on most distributions.
You can use the –include and –exclude command line parameters if you only
have a few exclusions, but I have a fairly large list, so I used files. I’ve
got a file named
exclude.txt listing all the full file system paths of files
and folders I want to exclude. I also have an
include.txt file to force some
subfolders of an excluded file to be included. For example:
/home/bagaag/Downloads /home/bagaag/.mozilla /home/bagaag/.minecraft
This lets me backup the
screenshots folders in
leave the rest of that folder excluded.
Here’s the commands I run to backup. I just have these in a bash script aliased
backup on my system, and I’ve gotten in the habit of running
before installing updates, which on Arch Linux is nearly daily. My laptop is
off when I’m not suing it, so this works better for me than a cron job.
Here are the commands in my backup script:
rdiff-backup --include-filelist /home/bagaag/scripts/include.txt --exclude-filelist /home/bagaag/scripts/exclude.txt /home/bagaag /mnt/backup rdiff-backup --remove-older-than 3M /mnt/backup
That second line removes history in the backup older than 3 months.
You can restore from backup by simply copying files from your backup folder.
It’s that easy. If you want to go back in time, you can use the
--restore-as-of and other commands
rdiff-backup makes available for doing
this. There are lots of options.
Check out <rdiff-backup.nongnu.org/examples.html> for a nice walk through the
main features of