How to Automate Backup MongoDB Using Kubernetes
In this blog post, we will guide you step by step on how to use Kubernetes to backup and restore MongoDB databases operating in Kubernetes environment.
MongoDB is an open-source and NoSQL database. MongoDB uses a JSON-like document and optional schema. MongoDB database has the ability to easily store a large amount of data and scales very easily.
Understanding the Basics
Before continue further with this article some basic understanding on the matter is needed. If you have experience with popular relational database systems such as MySQL, you may find some similarities when working with MongoDB.
The first thing you should know is that MongoDB uses json and bson (binary json) formats for storing its information. Json is the human-readable format which is perfect for exporting and, eventually, importing your data. You can further manage your exported data with any tool which supports json, including a simple text editor.
An example json document looks like this:
{"address":[
{"building":"1007", "street":"Park Ave"},
{"building":"1008", "street":"New Ave"},
]}
Json is very convenient to work with, but it does not support all the data types available in bson. This means that there will be the so called ‘loss of fidelity’ of the information if you use json. For backing up and restoring, it’s better to use the binary bson.
Second, you don’t have to worry about explicitly creating a MongoDB database. If the database you specify for import doesn’t already exist, it is automatically created. Even better is the case with the collections’ (database tables) structure. In contrast to other database engines, in MongoDB the structure is again automatically created upon the first document (database row) insert.
Third, in MongoDB reading or inserting large amounts of data, such as for the tasks of this article, can be resource intensive and consume much of the CPU, memory, and disk space. This is something critical considering that MongoDB is frequently used for large databases and Big Data. The simplest solution to this problem is to run the exports and backups during the night or during non-peak hours.
Fourth, information consistency could be problematic if you have a busy MongoDB server where the information changes during the database export or backup process. There is no simple solution to this problem, but at the end of this article, you will see recommendations to further read about replication.
While you can use the import and export functions to backup and restore your data, there are better ways to ensure the full integrity of your MongoDB databases. To backup your data you should use the command mongodump
. For restoring, use mongorestore
. Let’s see how they work.
Step 1: Create a Base Container
Backing Up a MongoDB Database : Creating a Dump file
dump.sh
#!/bin/bash
echo ******************************************************
echo Starting-BACKUP
echo ******************************************************
NOW="$(date +"%F")-$(date +"%T")"
FILE="$DB_NAME-$NOW"
mongodump --uri=$MONGODB_URI --out=/mongodump/db/$FILE
sleep 30 | echo End-BACKUP
Restoring a MongoDB Database : Creating a restore file
restore.sh
#!/bin/bash
echo ******************************************************
echo Starting-BACKUP
echo ******************************************************
NOW="$(date +"%F")-$(date +"%T")"
FILE="$DB_NAME-$NOW"
mongodump --uri=$MONGODB_URI --out=/mongodump/db/$FILE
sleep 30 | echo End-BACKUP
writing Dockerfiles
Dockerfile
FROM mongo
WORKDIR /opt/backup/
# Create app directory
WORKDIR /usr/src/configs
# Install app dependencies
COPY dump.sh .
RUN chmod +x dump.sh
COPY restore.sh .
RUN chmod +x restore.sh
Pushing a Docker container image to Docker Hub
you can use my docker image:
https://hub.docker.com/r/microfunctions/microfunctions-mongodump
https://github.com/microfunctionsio/microfunctions-mongodump
Step 2: add PersistentVolumeClaim
Understanding the Basics : https://kubernetes.io/docs/concepts/storage/persistent-volumes/
You must have an existing volume in use in your cluster, which you can create by creating a PersistentVolumeClaim
(PVC). For the purposes of this tutorial, presume we have already created a PVC by calling kubectl create -f your_pvc_file.yaml
with a YAML file that looks like this:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mongodb-backup
spec:
accessModes:
— ReadWriteOnce
resources:
requests:
storage: 2Gi
storageClassName: hostpath
Step 3: create CronJobs
You can use CronJobs for cluster tasks that need to be executed on a predefined schedule. As the documentation explains, they are useful for periodic and recurring tasks, like running backups, sending emails, or scheduling individual tasks for a specific time, such as when your cluster is likely to be idle.
As with Jobs, you can create CronJobs via a definition file. Following is a snippet of the CronJob file cron-mongodump-backup.yaml
. Use this file to create an example CronJob:
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: mongodump-backup
spec:
schedule: "0 */6 * * *" #Cron job every 6 hours
startingDeadlineSeconds: 60
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 2
jobTemplate:
spec:
template:
spec:
containers:
- name: mongodump-backup
image: microfunctions/microfunctions-mongodump
imagePullPolicy: "IfNotPresent"
env:
- name: DB_NAME
value: "microfunctions"
- name: MONGODB_URI
value: mongodb://microfunctions:UxXKmC9EAn@host-mongodb:27017/microfunctions
volumeMounts:
- mountPath: "/mongodump"
name: mongodump-volume
command: ['sh', '-c',"./dump.sh"]
restartPolicy: OnFailure
volumes:
- name: mongodump-volume
persistentVolumeClaim:
claimName: mongodb-backup
Apply the CronJob to your cluster:
kubectl apply -f cron-mongodump-backup.yaml
cronjob.batch/mongodump-backup created
Verify that the CronJob was created with the schedule in the definition file:
kubectl get cronjob mongodump-backup NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGEmongodump-backup */5 * * * * False 0 3m36s 8m47s
Show jobs :
kubectl get job -n microfunctionsNAME COMPLETIONS DURATION AGEmongodump-backup-1621085400 1/1 63s 10mmongodump-backup-1621085700 1/1 34s 5m19s
Show jobs logs :
kubectl logs mongodump-backup-1621085700-k9hz5Starting-BACKUP2021–05–15T13:35:07.234+0000 writing microfunctions.statushists to /mongodump/db/microfunctions-2021–05–15–13:35:07/microfunctions/statushists.bson2021–05–15T13:35:07.245+0000 done dumping microfunctions.statushists (5 documents)2021–05–15T13:35:07.246+0000 writing microfunctions.clusters to /mongodump/db/microfunctions-2021–05–15–13:35:07/microfunctions/clusters.bson2021–05–15T13:35:07.247+0000 done dumping microfunctions.clusters (1 document)2021–05–15T13:35:07.248+0000 writing microfunctions.users to /mongodump/db/microfunctions-2021–05–15–13:35:07/microfunctions/users.bson2021–05–15T13:35:07.248+0000 writing microfunctions.functions to /mongodump/db/microfunctions-2021–05–15–13:35:07/microfunctions/functions.bson2021–05–15T13:35:07.249+0000 writing microfunctions.sourcecodes to /mongodump/db/microfunctions-2021–05–15–13:35:07/microfunctions/sourcecodes.bson2021–05–15T13:35:07.249+0000 writing microfunctions.namespaces to /mongodump/db/microfunctions-2021–05–15–13:35:07/microfunctions/namespaces.bson2021–05–15T13:35:07.250+0000 done dumping microfunctions.functions (1 document)2021–05–15T13:35:07.250+0000 done dumping microfunctions.users (1 document)2021–05–15T13:35:07.251+0000 done dumping microfunctions.sourcecodes (1 document)2021–05–15T13:35:07.252+0000 done dumping microfunctions.namespaces (1 document)End-BACKUP