Getting started with Azure Data Explorer and Azure Synapse Analytics for Big Data processing

Azure Data Explorer is a fully managed data analytics service that can handle large volumes of diverse data from any data source, such as websites, applications, IoT devices, and more. Azure Data Explorer makes it simple to ingest this data and enables you to do complex ad hoc queries on the data in seconds. It scales quickly to terabytes of data, in minutes, allowing rapid iterations of data exploration to discover relevant insights. It is already integrated with Apache Spark work via the Data Source and Data Sink Connector and is used to power solutions for near real-time data processing, data archiving, machine learning etc.

[Read More]

Securely access Azure SQL Database from Azure Synapse

The Apache Spark connector for Azure SQL Database (and SQL Server) enables these databases to be used as input data sources and output data sinks for Apache Spark jobs. You can use the connector in Azure Synapse Analytics for big data analytics on real-time transactional data and to persist results for ad-hoc queries or reporting.

At the time of writing, there is no linked service or AAD pass-through support with the Azure SQL connector via Azure Synapse Analytics. But you can use other options such as Azure Active Directory authentication or via direct SQL authentication (username and password based). A secure way of doing this is to store the Azure SQL Database credentials in Azure Key Vault (as Secret) — this is what’s covered in this short blog post.

[Read More]

Build a pipeline to join streams of real time data

With traditional architectures, it’s quite hard to counter challenges imposed by real-time streaming data – one such use case is joining streams of data from disparate sources. For example, think about a system that accepts processed orders from customers (real time, high velocity data source) and the requirement is to enrich these “raw” orders with additional customer info such as name, email, location etc. A possible solution is to build a service that fetches customer data for each customer ID from an external system (for example, a database), perform a join (in-memory) and stores the enriched data in another database perhaps (materialized view). This has several problems though and one of them is not being able to keep up (process with low latency) with a high volume data.

[Read More]