Real-Time Search and Analytics with Confluent, Azure, Redis, and Spring Cloud

Self-managing a distributed system like Apache Kafka ®, along with building and operating Kafka connectors, is complex and resource intensive. It requires significant Kafka skills and expertise in the development and operations teams of your organization. Additionally, the higher the volumes of real-time data that you work with, the more challenging it becomes to ensure that all of the infrastructure scales efficiently and runs reliably.

Confluent and Microsoft are working together to make the process of adopting event streaming easier than ever by alleviating the typical infrastructure management needs that often pull developers away from building critical applications. With Azure and Confluent seamlessly integrated, you can collect, store, process event streams in real-time and feed them to multiple Azure data services. The integration helps reduce the burden of managing resources across Azure and Confluent.

[Read More]

PostgreSQL pgoutput plugin for change data capture

Set up a Change Data Capture architecture on Azure using Debezium, Postgres and Kafka was a tutorial on how to use Debezium for change data capture from Azure PostgreSQL and send them to Azure Event Hubs for Kafka - it used the wal2json output plugin.

What about the pgoutput plugin?

This blog will provide a quick walk through of how to pgoutput plugin and provide clarification on this point raised by Denis Arnaud (thank you for brining it up!)

[Read More]

Kafka on Kubernetes, the Strimzi way! (Part 4)

Welcome to part four of this blog series! So far, we have a Kafka single-node cluster with TLS encryption on top of which we configured different authentication modes (TLS and SASL SCRAM-SHA-512), defined users with the User Operator, connected to the cluster using CLI and Go clients and saw how easy it is to manage Kafka topics with the Topic Operator. So far, our cluster used ephemeral persistence, which in the case of a single-node cluster, means that we will lose data if the Kafka or Zookeeper nodes (Pods) are restarted due to any reason.

[Read More]

Kafka on Kubernetes, the Strimzi way! (Part 3)

Over the course of the first two parts of this blog series, we setup a single-node Kafka cluster on Kubernetes, secured it using TLS encryption and accessed the broker using both internal and external clients. Let’s keep iterating! In this post, we will continue the Kafka on Kubernetes journey with Strimzi and cover:

  • How to apply different authentication types: TLS and SASL SCRAM-SHA-512
  • Use Strimzi Entity operator to manage Kafka users and topics
  • How to configure Kafka CLI and Go client applications to securely connect to the Kafka cluster

The code is available on GitHub - https://github.com/abhirockzz/kafka-kubernetes-strimzi/

[Read More]

Change Data Capture architecture using Debezium, Postgres and Kafka

Change Data Capture (CDC) is a technique used to track row-level changes in database tables in response to create, update and delete operations. Different databases use different techniques to expose these change data events - for example, logical decoding in PostgreSQL, MySQL binary log (binlog) etc. This is a powerful capability, but useful only if there is a way to tap into these event logs and make it available to other services which depend on that information.

[Read More]

Kafka on Kubernetes, the Strimzi way! (Part 2)

We kicked off the the first part of the series by setting up a single node Kafka cluster which was accessible to only internal clients within the same Kubernetes cluster, had no encryption, authentication or authorization and used temporary persistence. We will keep iterating/improving on this during the course of this blog series.

This part will cover these topics:

  • Expose Kafka cluster to external applications
  • Apply TLS encryption
  • Explore Kubernetes resources behind the scenes
  • Use Kafka CLI and Go client applications to test our cluster setup

The code is available on GitHub - https://github.com/abhirockzz/kafka-kubernetes-strimzi/

[Read More]

Kafka on Kubernetes, the Strimzi way! (Part 1)

Some of my previous blog posts (such as Kafka Connect on Kubernetes, the easy way!), demonstrate how to use Kafka Connect in a Kubernetes-native way. This is the first in a series of blog posts which will cover Apache Kafka on Kubernetes using the Strimzi Operator. In this post, we will start off with the simplest possible setup i.e. a single node Kafka (and Zookeeper) cluster and learn:

  • Strimzi overview and setup
  • Kafka cluster installation
  • Kubernetes resources used/created behind the scenes
  • Test the Kafka setup using clients within the Kubernetes cluster

The code is available on GitHub - https://github.com/abhirockzz/kafka-kubernetes-strimzi

[Read More]

Data pipeline using MongoDB and Kafka Connect on Kubernetes

In Kafka Connect on Kubernetes, the easy way!, I had demonstrated Kafka Connect on Kubernetes using Strimzi along with the File source and sink connector. This blog will showcase how to build a simple data pipeline with MongoDB and Kafka with the MongoDB Kafka connectors which will be deployed on Kubernetes with Strimzi.

I will be using the following Azure services:

Please note that there are no hard dependencies on these components and the solution should work with alternatives as well

[Read More]