Configure GitLab backup into Amazon S3

In the previous post, I tried to write a step by step procedure to install and configure a GitLab Community Edition Server. Now, whenever we are running a server with some kind of hosting, whatever that may be, site-hosting, file-hosting, image-hosting or any other kind of hosting, proper backup is needed. Whatever we do to protect our server, for a cloud vps, it’s unlikely, but it may crash, or for some un-known reason, hosting account may be suspended, or host may go down for bankruptsy or server’s IP may get banned and the site can’t be accessed due to the firewall of the ISP. So, there are many reasons, that can result in no-access-to-the-server. And as a precaution, we need to take backup of the server regularly.

For GitLab, backup means backup of configuration and application data both. More details.

To take backup, first we need to decide on the policy regarding storing the backup. Yes this is very important.

It’s possible to take backup regularly, download the archive files to local computer and keep the archives in a convenient place. But, it’s good to have an automated process for this. And we should keep in mind that, our local machines can crash and if that happens, all the backups will go to vein. For this reason, we should select providers like AWS S3, Google CLoud Storage, RackSpace Cloud Files to store our backups. They have disater recovery processes and there is almost no chance that any file uploaded to these services, will be lost, unless we explicitly do this. GitLab depends on Fog for uploading backups to remote locations. And Fog supports only the above three cloud providers along with local storage. By comparing the pricing of the above three and considering the fact, mostly we will just upload and store the files, downloading will not happen very often, found that, AWS S3 has the least pricing. Refer to, AWS S3 pricing, Google Cloud Storage pricing and RackSpace Cloud Files pricing. Observe that, there are three types of storage in AWS S3 and Google Cloud Storage. We can decide that, initially we will upload the files to Standard Storage and after some time move the files to Glacier Storage of AWS or Nearline Storage of Google and after a long period of time, say one year, we will actuallly delete the files. Beacause after such a long period, there will not remain any relevance of such old backup. In S3, we can explicitly set such rule. We will see.

I’ve used AWS S3 to preserve the backed up data. If you want to use any other provider, this guide may not help you that much.

Let’s start.

Create one AWS account. If you’re an Amazon customer, you may either link the AWS account with the exisiting customer account or create a fresh one with new email. Amazon also gives a 12 month free tier, upon signup of a new account. Here one credit or debit card eligible for online transaction is required. Open AWS Console, click on the link/button Sign In to the Console and complete the procedure to sign up and in the last page click on the link, something like Sign In to Console. Check the email in the username box and select the returning user radio button and enter the password.

To create a bucket, think bucket as a bowl to put your backups, go here. Click on the Create Bucket button. Give the bucket name, you decided and select a region (here choice of region doesn’t matter seriously unless you have some constraint), and if you need logging, click on Enable Logging, otherwise click on Create. Your bucket is ready to take files.

Here we have to set the Lifecycle of the files. Click on the bucket, just created and click on the Properties button if not selected in prior. Click Lifecycle and create one rule. The wizard is self explaining. Fill it, at your own choice. I’ve set that, after 30 days, files would move to Glacier Storage and after 365 days from the creation time the files will be deleted. Now, let’s move towards to create users to configure the automated backup process.

It’s not recommended to use the root user, the user, you have logged with, in any of the automated tasks. Because if someone with malicious intent gets access to our server, severe damage can happen. So, create one IAM user here. Click on the Users tab to the left and click on Create New User. You can create upto 5 users at a time. Here we need only one user. Give the username. Keep the checkbox Generate an access key for each user checked. Click Create and in the next page download the credential by clicking on the button Download Credentials and then click Close. It’ll be better to create a group with proper inline policy, so that, it’ll be easier to manage user permissions.

Click on the Groups tab to the left. And create one group. Don’t attach any policy here. Just create the group. After creation, click on the group to open the details. Click on the Permissions and open the accordion Inline Policies. Click on the Click Here link. And select the button corresponding to Policy Generator. Select S3 from the dropdown of AWS Service. And select the following actions for specified resources.

# Resource - "arn:aws:s3:::<bucket-name>/*" 
"s3:AbortMultipartUpload",
"s3:GetBucketAcl",
"s3:GetBucketLocation",
"s3:GetObject",
"s3:GetObjectAcl",
"s3:ListBucketMultipartUploads",
"s3:PutObject",
"s3:PutObjectAcl"

# Resource - "*"
"s3:GetBucketLocation",
"s3:ListAllMyBuckets"

# Resource - "arn:aws:s3:::<bucket-name>"
"s3:ListBucket"

After putting the resource name each time click on Add Statement and finally click on Next Step and give a name of the policy and create it. The policy will be listed in the Permission tab of the Group.

Click on the Users tab in the group details page. Click Add Users to Group to add the user created previously.

Now we have to configure our GitLab server to upload the backups to AWS. Open /etc/gitlab/gitlab.rb in your favourite editor and add the search for the text manage_backup_path and make the block like below. More Details.

gitlab_rails['manage_backup_path'] = true
gitlab_rails['backup_path'] = "/path/to/backup/location/in/server"
# gitlab_rails['backup_archive_permissions'] = 0644 # See: http://doc.gitlab.com/ce/raketasks/backup_restore.html#backup-archive-permissions
# gitlab_rails['backup_pg_schema'] = 'public'
# gitlab_rails['backup_keep_time'] = 604800
gitlab_rails['backup_upload_connection'] = {
   'provider' => 'AWS',
   'region' => 'us-west-2',
   'aws_access_key_id' => '_secret_id_',
   'aws_secret_access_key' => '_secret_key_'
}
gitlab_rails['backup_upload_remote_directory'] = 'bucket-name'
gitlab_rails['backup_multipart_chunk_size'] = 104857600
gitlab_rails['backup_encryption'] = 'AES256' # Turns on AWS Server-Side Encryption with Amazon S3-Managed Keys for backups

For the codename of AWS regions, refer to this. AWS secret access key id and access key can be found in the downloaded credential file, when the IAM user was created. Replace bucket-name with proper name of the bucket, you had created in prior.

Almost done. Now, we have to run only two commands.

# reconfigure GitLab with the modified configurations
sudo gitlab-ctl reconfigure

# create first backup of the server
sudo gitlab-rake gitlab:backup:create

If all the steps have configured correctly, you should be able to see a .tar file in the web view of the bucket.

But, we don’t want to take backups manually. So, we will set a cronjob in our server. Invoke the command sudo crontab -e -u root, or you may want specify the useid, running the GitLab server. If the crontab has not been configured, it will ask for the default editor, you want to use. Select one by giving one number. Then the cron-file will be opened in the editor selected.

Now, we have to understand a little bit about the syntax of the cron. It’s very easy. Just read this once. By the knowledge, we just acquired, let’s add one line to the end of the cron file and save the file. Cron will automatically create one cronjob for it.

# it says that run the backup command on the 7th day, i.e. 
# Saturday, depends on the server setting, at 23:59 hrs
# and the backup command will take care of creating backups
# and uploading them to AWS S3
59 23 * * 7 /usr/bin/gitlab-rake gitlab:backup:create

It completes the automation of taking backups of application data, i.e. repositories, databases etc.

We are not finished yet. We need to take backup of the /etc/gitlab folder also. But, as this folder does not change very frequently, decided to take backup of this folder manually. Specially if you have many users you may want to setup a cron job to create archives of this folder at regular intervals. Refer to this.

Code happily and be happy. :-)