Change Data Capture (CDC)
Change Data Capture that is a set of software design patterns used to track the data efficiently. We can identify and capture data that has been added to, updated in, or removed from database tables making it available in a format consumable by ETL(Extract, transform, load), Enterprise application integration(EAI), or other types of data integration tools.
What is Change Data Capture CDC?
Change Data Capture is the process that ensures that changes made over time in one dataset are automatically transferred to another dataset.
It is an approach to data integration that is based on the identification, capture, and delivery of the changes made to enterprise data sources. Using CDC in our environments, you can preserve the state of data across time in any database or data repository system that allowed you to enable CDC.
Methods to Implement Change Data Capture (CDC)
- Synchronous change data capture using triggers: Triggers on the source database allow change data to be captured immediately, as each SQL statement that performs a data manipulation language (DML) operation (INSERT, UPDATE, or DELETE) is made. In this mode, change data is captured as part of the transaction modifying the source table.
- Asynchronous change data capture using redo logs: Asynchronous change data capture is implemented using redo logs. Redo logs are logs that keep a log of all activities in a database. As the name suggests, this is an asynchronous process that does not cause any performance hit. Taking advantage of the data sent to the redo log files, change data is captured after a SQL statement that performs a DML operation is committed. In this mode, change data is not captured as part of the transaction that is modifying the source table, and therefore has no effect on that transaction.
Advantages of Change Data Capture (CDC)
- Change Data Capture can capture all effects of INSERT, UPDATE, and DELETE operations made on schema, table or specific column. You can be specific data that you would like tracking.
- Asynchronous Change Data Capture can be configured to have minimal performance impact on the source database.
- Change Data Capture in almost all databases support publish-subscribe software design pattern, which provide easy-to-use interfaces.
- Change Data Capture reduces overhead cost because it simplifies the extraction of change data from the database.
- Change Data Capture minimizes disruptions to production workloads.
- Change Data Capture reduces the cost of transferring data over the wide area network (WAN) by sending only incremental changes.
- Change Data Capture enables faster, and more accurate decisions based on the most current data.
Why use change data capture?
- Consolidating data from multiples sources combined in one consolidated database.
- Enabling business intelligence, analytics, and reports with the least possible impact in production environment.
- Tracking your business operation in real-time or near real-time.
- Populating centralized database, data marts, data warehouses, or data lakes.
- Enabling Machine Learning, advanced analytics and AI on modern architectures like Hadoop and Spark.
- Real-time Business motorization allow creating alert and resolve problem in the shortest time.
Some Database management that support Change Data Capture
- Oracle database since version 9i to version 12.1.0.2
- SQL Server since version 2008 to current
- MySQL
- Postgresql
- MongoDB
References
Oracle CDC https://docs.oracle.com/cd/E11882_01/server.112/e25554/cdc.htm
MySQL Binlog https://dev.mysql.com/doc/internals/en/binary-log-overview.html
PostgreSQL WAL https://www.postgresql.org/docs/11/runtime-config-wal.html