Statement-Based vs Row-Based Replication


Replication as most people know it, has mostly been SQL statement propagation from master to slave. This is known as “statement-based” replication. But there is also another kind of replication that is available, “the row-based replication” and that has quite a lot of benefits.

Statement-Based vs Row-Based Replication

Statement-based Replication

With statement-based replication, every SQL statement that could modify data is logged on the master. Then those SQL statements are replayed on the slaves against the same dataset and in the same context. The statement-based replication corresponds to the statement-based binary logging format.

Row-based Replication

With row-based replication, every “row modification” is logged on the master and is then applied on the slave. The keywords here are “row modification”, which implies that row-based replication is physical, in the sense that SQL statements that change the rows are not recorded, instead the entire updated row is written to the binary log. But there are a few exceptions, when a new table is created, dropped, or altered, the actual SQL statement is recorded. The row-based replication corresponds to row-based binary logging format.

Click here to see Binary Logging Formats

Below are the advantages and disadvantages of both the types of replication to help you choose the best one.

Advantages of Statement-based Replication

• There is always less data that is to be transferred between the master and the slave.
• There is less space taken up in the update logs.
• There is no need to deal with the row format.
• Also, auditing the database is easy, because statements that made any changes to the data are all logged in the binary log.

Disadvantages of Statement-based Replication

• The single biggest disadvantage of statement-based replication is the data-inconsistency issue between the master and the slave that creeps up due to the way this kind of replication works. Because we are logging the SQL statements, it is also necessary to log context information, so that the updates produce the same results on the slave as they did originally on the master. But in some cases it is not possible to provide any such context. Any nondeterministic behavior, is not going to have any such context present and hence is difficult to replicate using statement-based replication.
• “For example, for INSERT … SELECT with no ORDER BY, the SELECT may return rows in a different order (which results in a row having different ranks, hence getting a different number in the AUTO_INCREMENT column), depending on the choices made by the optimizers on the master and slave.”
• With statement-based replication, you are bound to encounter issues with replicating stored routines or triggers, and hence this kind of replication does not always work with stored routines and triggers.
• There is a performance penalty in the case of INSERT … SELECT, because in the case of statement-based replication this kind of statement requires a greater number of row-level locks as compared to row-based replication.

Advantages of Row-based Replication

Following are the major advantages of row-based replication:
• With row-based replication, each and every change can be replicated and hence this is the safest form of replication.
• Because every row update is physically logged, hence there no need for any context information. The only thing that is needed is to know which record is being updated and what is the update that is being written to that record.
• There are fewer row locks required on the master, which thus achieves high concurrency.
• The problems with auto_increment columns, timestamps, stored routines, and triggers don’t bother us with this kind of replication.

Disadvantages of Row-based replication

Following are the disadvantages of row-based replication:
• On a system that frequently updates large number of rows such as, UPDATE items set status=’Available’ where item_id BETWEEN 10000 and 30000; row-based replication produces very large update logs and generates a lot of network traffic between the master and the slave.
This kind of replication requires a lot of awareness of the internal row format.
• In cases of very large updates, the performance overhead associated with the increased I/O required to write large update logs could become unacceptable.
• You cannot examine the logs to audit changes to the database, because SQL statements are not logged, instead the data is logged in binary format.


<<Click here to see all posts>>

If you found any of the information on this page helpful in anyway then please consider sharing this content with your favorite social network or by leaving your thoughts in the comment section. Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *

Prove You Are Human Time limit is exhausted. Please reload CAPTCHA.