Replication in MongoDB

June 18, 2021

Currently there are solutions that handle large volumes of data and users as social networks, banking systems, among others, which need to remain active for any eventuality presented, say, an electricity failure, network equipment failure, which for us as users would be critical, can you imagine that such an eventuality was presented at the bank which you are a customer and all your money just disappear? or What if all your photos of your favorite social network just got erased one day?. If, in such an environment prone to failures all these situations may arise and in fact occur, but it is transparent for us as service providers as previous implement replication solutions and high availability to prevent such situations.

Replication is the process of copying and maintaining objects in multiple databases to have a distributed system, enabling improved performance and protect the availability of applications by providing alternative access to data.

All base management modern data offer ways to provide high availability and replications, allowing them to be useful in cases of failures, but many, if not most, need outsourced tools to provide a robust and efficient mechanism also level programming, can complicate the existence programmers to be a little tedious when configuring and testing, however, the creators and contributors of MongoDB provide us a way quite simple to provide high availability and replication.

In MongoDB, replication it is how to provide high availability and fault tolerance native and transparent to applications that use it as a database manager form, allowing programmers should not nor understand what happens behind process, just be sure that they have the same and that is quite robust and efficient. Replication part of a collection of instances or nodes MongoDB, called replica set, where should always be a primary node to be active.

mongo replication process

The minimum number of nodes to form a replica set is three, since in case of a failure in the primary, an election process is activated to search among the remaining nodes in a single substitute to continue providing the service. If there are only two nodes, there would be no majority in the election and not a new primary node running the inactive set would be selected.

mongo primary election process

Types of Nodes

  • Regulars: are nodes that have the data and can be either primary or secondary.
  • Arbiter: they are there only for routing and votes; It is the one that allows a node as a parent to choose in case of failure.
  • Delayed: is a user-defined node behind the other nodes and known as disaster time.
  • Hidden: are nodes implemented mostly as analytical.
mongo node types

Replication process

MongoDB implements a special collection that keeps recovery logs for all operations that modify data, called "oplog" or log of operations. Modification operations are performed first on the primary node and then the same oplog , secondary, copy and apply these processes executed asynchronously operations.

All members of the ensemble have a copy of the oplog in the collection: "local.oplog.rs" to keep updated their database. This is done through heartbeats or pings all members allowing import records from any of the other nodes in the set. Then, in case of a failure, if a node "A" returns as secondary after a fairly long period and oplog has iterated in the primary new "B", will proceed to copy all oplog data of "B" in "A"

Apart from the Oplog, MongoDB implements two types of synchronization to keep all nodes in a replica set, the first is the initial synchronization to load new members with all data set, and the second is replication which keeps the updated set after the initial data synchronization.

mongo replication process

Process scripts and recognition scripts

In the default configuration scripts they are always directed to the primary, but these can be configured both in connection with the drivers as well as calls to insert or update the following parameters:

  • 0: does not expect confirmation that the writing was successful always returning successful status.
  • 1: it is used by default, and return successful status when the primary node recognizes the inserts.
  • majority: scriptures operations return successful status only if the majority of voters recognize nodes set the write operation.
  • n: scriptures operations return successful status only if the number of voters nodes specified as "n" whole recognize the write operation.
mongo write on replica set process

It is important to note that when there is no primary, the writing can not be completed, being able to present cases in which mongo must roll back the data in case it detects any inconsistency between node used to be Primary and happened to be. Another point that you need to know is that if a number of nodes for the recognition of greater deeds to the number of nodes is specified, the script will wait forever.

Readings and readings preferences

By default, mongo takes its readings to the primary to provide a strong consistency between written with the read data, however, it allows this behavior is modified in its configuration according to the needs that have:

  • primary: default mode; all read operations are the primary.
  • primaryPrefered: allows reading operations are to side in cases where there is no primary available for the moment.
  • secondary: all operations are read to the child nodes.
  • secondaryPrefered: It allows reading operations are to primary side in cases that there are available for the moment.
  • nearest: read operations are the member of the replica set that has the lowest latency network, regardless of whether it is primary or secondary.
mongo readon replica set process

Aspects to consider when using replica sets MongoDB applications

  1. Node lists: drivers need to know the members of the replica set (or at least one valid) in order to function correctly. These are initialized when loading drivers mongo language with which you are working.
  2. Preferences readings: the application must be prepared to deal with that at some point data can be returned outdated reading fanout.
  3. Recognition scriptures: if an error occurs during a write it is possible that the driver left waiting indefinitely response to insertion, which is quite critical, and our application should be able to handle such situations.
  4. Errors, mistakes: the application must be able to handle different types of exceptions, as not only has to deal with cases as above, but also with network errors, mongo side configurations, to name a few.

Much theory, we believe a replica set from the console mongo

  1. Identify the members of the group running from the console of each of our nodes: [prism:javascript] mongod --replSet "rs0"; [/prism:javascript]
  2. We started the set from the console of one of the members: [prism:javascript] rs.initiate(); [/prism:javascript]
  3. We check the state of our reply: [prism:javascript] rs.conf(); [/prism:javascript] This command displays a result like the following: [prism:javascript] { "_id" : "rs0", "version" : 1, "members" : [ { "_id" : 1, "host" : " mongodb0.rootstack.com:27017" } ] } [/prism:javascript]
  4. Add the remaining instances of tango that are part of our replica: [prism:javascript] rs.add("mongodb1.rootstack.com"); rs.add("mongodb2.rootstack.com"); . . . rs.add("mongodbN.rootstack.com"); [/prism:javascript]
  5. Ready, we have a fully functional replica set. We can check the status of our replica from one of the nodes with the command: [prism:javascript] rs.status(); [/prism:javascript]

In conclusion, the high availability system MongoDB is quite convenient and easy to deploy, robust and efficient functionality, allows for a distributed environment without having to worry about super strange configurations to a number of components that have that link to provide a service of this kind, moreover, that as the project is in constant improvements, updated with the needs of programmers, with a very large behind and increasingly used by companies of different headings community, we It gives confidence that took a good choice to choose him as manager database.

I hope to have a better understanding of the process of replication in MongoDB, then I'll be posting on this aggregation framework, a feature that allows query operations as group by (SQL queries like) a MongoDB, if , a NoSQL database.