
Summary
Azure Event Hub is a fully managed data ingestion service. Capable of coping with huge volumes of events. It can manage millions of events per second. In case when you have various IoT devices or distributed software this services allows to capture data from all of them. Once you capture this data then it’s a matter of adding relevant service to process it.
Azure Event Hub is working as a message broker. You can throw enormous amount of data at it and consume it by using other Azure services. When you are going to build some kind of platform. This tool should be in your toolbox.
This service has few options that you should know about. Without complete understandings of they way it works it will be difficult to take advantage of it. Below are the most important summaries of each Azure Event Hub feature.
Namespace
There is a way to manage multiple streams of data ingestions by creating event hub namespace. It is a place when you create hubs in order to control them. By using namespaces you can group your hubs into logical entities. It is a container with unique URL in which you can create multiple hubs. Such architecture has clear benefits for business. You can configure it to support other tools that you are using.
For Kafka users
If you are using Kafka then there is good news for you. You can use
Publishing
Event Hub provides client libraries allowing to publish events from .NET clients. For other runtimes and platform you can use Advanced Message Queuing Protocol client or plain HTTPS protocol. Each have pros and cons so make some research before you choose one. You can publish events individually or in batches.
Partitioning
Each message incoming to the hub is partitioned, which is an ordered sequence of events. At the minimum level, there are 2 partitions and a
If you want to organize your data there is an option to map incoming messages to specific partition. For that you can use partition key.
Capture
To make it even more powerful you can save messages in different time windows. Azure calls it Capture. You can specify time intervals for which data will go into Azure Storage or Azure Data Lake Store. It scales automatically with throughput units. Capture simplifies work by allowing our concentration on data processing logic instead of capturing this data. You can process real time data or batch jobs on streams. Data is written in Apache Avro format.
SDK available on all platforms allows to consume message from specific predefined times. By moving checkpoint and offset you can consume messages several times. This will make you service very resilient.
Consumer Groups
Event consumer is an entity that reads data from Event Hub. Consumer groups is a view of entire event hub. This enables multiple applications to consume events each having separate view os the stream. By using this architecture they can work independently. You can allow up to 5 concurrent readers on a partition per each consumer group. However only one is recommended.
You can create different services listening to the hub. You can change the way it works to set consumer groups. You can use each group and linked specific service to consume data from this group.
Security
You can limit access to the hub by creating policy at the hub or namespace level.
Authentication is using Shared Access Signatures. They are available at namespace or event hub level. SAS token is generated from SAS key and, it is a SHA hash of a URL.
You can connect Event hub to virtual network. This allows hub to recognize internal traffic. This works as a security feature. Blocking all external traffic on the firewall level prevents from attacks.
Redundancy
Very useful feature is georecovery. You can pair namespaces between regions. In case one region goes down you can replicate data to different geographical region.
Dictionary
Throughput unit: It is 1MB of data transfer per second ingress and 2MB per second output.
Data will be emitted to event at a speed of 1MB/sec and consume at speed of 2 MB/sec. They can be managed programmatically from Event Hub API.
Offset: It’s a position of an event within partition. You can specify offset as a timestamp or value.
Checkpoint: SDK available on all platforms allows to consume message only from specific predefined times. Reader can mark or commit their position within partition sequence. This is responsibility of a consumer. Occurs per partition basis.
Consumer groups: it is unique view on data available to services using this data. After each message is consumed hub puts a checkpoint on this message so that consumer groups know that this message has been consumed. You can set views on different groups so first message which was consumed can be still visible to other groups.
I am very interested when you use Event hub and what do you think about it ?