Abhishek's blog

Getting started with Rust and Redis

Posted on January 20, 2021 | 8 minutes |

Are you learning Rust and looking for ways to get some hands-on practice with concrete examples? A good approach might be to try and integrate Rust with external systems. Why not try to use it with Redis? It is a powerful, versatile database but dead simple to get started with!

In this blog post, you will learn how to use the Rust programming language to interact with Redis using the redis-rs client. We will walk through commonly used Redis data structures such as String, Hash, List etc. The Redis client used in the sample code exposes both high and low-level APIs and you will see both these styles in action.

[Read More]

Learn how to setup data pipeline from PostgreSQL to Cassandra using Kafka Connect

Posted on January 18, 2021 | 7 minutes |

Apache Kafka often serves as a central component in the overall data architecture with other systems pumping data into it. But, data in Kafka (topics) is only useful when consumed by other applications or ingested into other systems. Although, it is possible to build a solution using the Kafka Producer/Consumer APIs using a language and client SDK of your choice, there are other options in the Kafka ecosystem.

One of them is Kafka Connect, which is a platform to stream data between Apache Kafka and other systems in a scalable and reliable manner. It supports several off the shelf connectors, which means that you don’t need custom code to integrate external systems with Apache Kafka.

[Read More]

Integrate Kafka and Cassandra using Kafka Connect

Posted on December 17, 2020 | 12 minutes |

This blog post demonstrates how you can use an open source solution (connector based) to ingest data from Kafka into Azure Cosmos DB Cassandra API. It uses a simple yet practical scenario along with a re-usable setup using Docker Compose to help with iterative development and testing. You will learn about:

Overview of Kafka Connect along with the details of the integration
How to configure and use the connector to work with Azure Cosmos DB
Use the connector to write data to multiple tables from a single Kafka topic

By the end of the article, you should have a working end to end integration and be able to validate it.

[Read More]

Build a Serverless app using Go and Azure Functions

Posted on December 10, 2020 | 9 minutes |

Webhook backend is a popular use case for FaaS (Functions-as-a-service) platforms. They could be used for many use cases such as sending customer notifications to responding with funny GIFs! Using a Serverless function, it’s quite convenient to encapsulate the webhook functionality and expose it in the form of an HTTP endpoint. In this tutorial you will learn how to implement a Slack app as a Serverless backend using Azure Functions and Go. You can extend the Slack platform and integrate services by implementing custom apps or workflows that have access to the full scope of the platform allowing you to build powerful experiences in Slack.

[Read More]

Picking the Right Distributed Database

Posted on November 27, 2020 | 1 minutes |

“In God we trust, all others must bring data”

William Edwards Deming

Well, Microsoft is bringing to you, Data Week 🙌 A celebration of Data & Data Technologies, running throughout the week, starting December 7, 2020!

It kicks off with Create: Data, a completely FREE online event.

Register at https://aka.ms/createdata !

Been wondering which database to pick for your next project/product?

Tim Berglund (who by the way, I admire a lot!) from Confluent will be joining me in a conversation about “Picking the Right Distributed Database”. Databases are a critical part of any business. But, how do you pick the right one given the multitude of options at your disposal and new ones coming up quite frequently. Should you stick to good-old RDBMS, opt for NoSQL variants, or go multi-model? Is there really a right answer?

[Read More]

Change Data Capture from PostgreSQL to Azure Data Explorer using Kafka Connect

Posted on November 2, 2020 | 16 minutes |

This blog post demonstrates how you can use Change Data Capture to stream database modifications from PostgreSQL to Azure Data Explorer (Kusto) using Apache Kafka.

Change Data Capture (CDC) can be used to track row-level changes in database tables in response to create, update and delete operations. It is a powerful technique, but useful only when there is a way to leverage these events and make them available to other services.

Introduction

Using Apache Kafka, it is possible to convert traditional batched ETL processes into real-time, streaming mode. You can do-it-yourself (DIY) and write good old Kafka producer/consumer using a client SDK of your choice. But why would you do that when you’ve Kafka Connect and it’s suite of ready-to-use connectors?

[Read More]

Data Ingestion into Azure Data Explorer using Kafka Connect on Kubernetes

Posted on September 25, 2020 | 10 minutes |

In this blog, we will go over how to ingest data into Azure Data Explorer using the open source Kafka Connect Sink connector for Azure Data Explorer running on Kubernetes using Strimzi. Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems using source and sink connectors and Strimzi provides a “Kubernetes-native” way of running Kafka clusters as well as Kafka Connect workers.

Azure Data Explorer is a fast and scalable data exploration service that lets you collect, store, and analyze large volumes of data from any diverse sources, such as websites, applications, IoT devices, and more. It has a rich connector ecosystem that supports ingestion into Azure Data Explorer as detailed here. One of the supported sources is Apache Kafka and the sink connector allows you to move data from Kafka topics into Azure Data Explorer tables which you can later query and analyse. The best part is that you can do so in a scalable and fault tolerant way using just configuration!

[Read More]

Build fault tolerant applications with Cassandra API for Azure Cosmos DB

Posted on September 10, 2020 | 9 minutes |

Azure Cosmos DB is a resource governed system that allows you to execute a certain number of operations per second based on the provisioned throughput you have configured. If clients exceed that limit and consume more request units than what was provisioned, it leads to rate limiting of subsequent requests and exceptions being thrown – they are also referred to as 429 errors.

With the help of a practical example, I’ll demonstrate how to incorporate fault-tolerance in your Go applications by handling and retrying operations affected by these rate limiting errors. To help you follow along, the sample application code for this blog is available on GitHub - it uses the gocql driver for Apache Cassandra.

[Read More]

cassandra go nosql azure cosmos db

Build a pipeline to join streams of real time data

Posted on August 28, 2020 | 3 minutes |

With traditional architectures, it’s quite hard to counter challenges imposed by real-time streaming data – one such use case is joining streams of data from disparate sources. For example, think about a system that accepts processed orders from customers (real time, high velocity data source) and the requirement is to enrich these “raw” orders with additional customer info such as name, email, location etc. A possible solution is to build a service that fetches customer data for each customer ID from an external system (for example, a database), perform a join (in-memory) and stores the enriched data in another database perhaps (materialized view). This has several problems though and one of them is not being able to keep up (process with low latency) with a high volume data.

[Read More]

analytics azure azure event hubs sql server azure cosmos db

How to Ingest data from Kafka into Azure Data Explorer

Posted on August 21, 2020 | 8 minutes |

This blog will cover data ingestion from Kafka to Azure Data Explorer (Kusto) using Kafka Connect.

Azure Data Explorer is a fast and scalable data exploration service that lets you collect, store, and analyze large volumes of data from any diverse sources, such as websites, applications, IoT devices, and more. Kafka Connect platform allows you to stream data between Apache Kafka and external systems in a scalable and reliable manner. The Kafka Connect Sink connector for Azure Data Explorer allows you to move data in Kafka topics to Azure Data Explorer tables which you can later query and analyze.

[Read More]