• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer

spythesky

  • Home
  • Blog
  • About
  • Contact
You are here: Home / Cloud / Dictionary of Azure Event Hub

Dictionary of Azure Event Hub

Last Updated on 10 February 2019 by Krzysztof Nojman

Azure Event Hub

Summary

Azure Event Hub is a fully managed data ingestion service. Capable of coping with huge volumes of events. It can manage millions of events per second. In case when you have various IoT devices or distributed software this services allows to capture data from all of them. Once you capture this data then it’s a matter of adding relevant service to process it.

Azure Event Hub is working as a message broker. You can throw enormous amount of data at it and consume it by using other Azure services. When you are going to build some kind of platform. This tool should be in your toolbox.

This service has few options that you should know about. Without complete understandings of they way it works it will be difficult to take advantage of it. Below are the most important summaries of each Azure Event Hub feature.

Namespace

There is a way to manage multiple streams of data ingestions by creating event hub namespace. It is a place when you create hubs in order to control them. By using namespaces you can group your hubs into logical entities. It is a container with unique URL in which you can create multiple hubs. Such architecture has clear benefits for business. You can configure it to support other tools that you are using.

For Kafka users

If you are using Kafka then there is good news for you. You can use Kafka protocol with Event hub. You will get access to Kafka endpoint. Which means that now you can use your Kafka applications with Event Hub. This provides alternative to running Kafka clusters.

Publishing

Event Hub provides client libraries allowing to publish events from .NET clients. For other runtimes and platform you can use Advanced Message Queuing Protocol client or plain HTTPS protocol. Each have pros and cons so make some research before you choose one. You can publish events individually or in batches.

Partitioning

Each message incoming to the hub is partitioned, which is an ordered sequence of events. At the minimum level, there are 2 partitions and a maximum 32. If you want more partitions you can contact the Event Hub team.  Messages that are sent to the hub are written to partition in order of arrival. New event is added to the end of a sequence. Very useful feature is message retention that allows to hold messages up to 7 days. This helps if you want to reprocess messages multiple times. Consumer can read specific subset or partition of the message stream.

If you want to organize your data there is an option to map incoming messages to specific partition. For that you can use partition key.

Capture

To make it even more powerful you can save messages in different time windows. Azure calls it Capture. You can specify time intervals for which data will go into Azure Storage or Azure Data Lake Store. It scales automatically with throughput units. Capture simplifies work by allowing our concentration on data processing logic instead of capturing this data. You can process real time data or batch jobs on streams. Data is written in Apache Avro format.

SDK available on all platforms allows to consume message from specific predefined times. By moving checkpoint and offset you can consume messages several times. This will make you service very resilient.

Consumer Groups

Event consumer is an entity that reads data from Event Hub. Consumer groups is a view of entire event hub. This enables multiple applications to consume events each having separate view os the stream. By using this architecture they can work independently. You can allow up to 5 concurrent readers on a partition per each consumer group. However only one is recommended.

You can create different services listening to the hub. You can change the way it works to set consumer groups. You can use each group and linked specific service to consume data from this group.

Security

You can limit access to the hub by creating policy at the hub or namespace level.

Authentication is using Shared Access Signatures. They are available at namespace or event hub level. SAS token is generated from SAS key and, it is a SHA hash of a URL.

You can connect Event hub to virtual network. This allows hub to recognize internal traffic. This works as a security feature. Blocking all external traffic on the firewall level prevents from attacks.

Redundancy

Very useful feature is georecovery. You can pair namespaces between regions. In case one region goes down you can replicate data to different geographical region.

Dictionary

Throughput unit: It is 1MB of data transfer per second ingress and 2MB per second output.

Data will be emitted to event at a speed of 1MB/sec and consume at speed of 2 MB/sec. They can be managed programmatically from Event Hub API.

Offset: It’s a position of an event within partition. You can specify offset as a timestamp or value.

Checkpoint: SDK available on all platforms allows to consume message only from specific predefined times. Reader can mark or commit their position within partition sequence. This is responsibility of a consumer. Occurs per partition basis.

Consumer groups: it is unique view on data available to services using this data. After each message is consumed hub puts a checkpoint on this message so that consumer groups know that this message has been consumed. You can set views on different groups so first message which was consumed can be still visible to other groups.

I am very interested when you use Event hub and what do you think about it ?

Filed Under: Cloud, Tools

Primary Sidebar

Hi
I am Kris, I like technology that solves our problems. I would like to welcome you to my site. I hope you will find something useful and interesting. Any ideas are always welcome. Read More…

Recent Posts

  • What is Apache Spark

    What is Apache Spark

    Apache Spark is an open-source data processing engine. It contains …Read More »
  • How HDInsight can help you with Big Data?

    How HDInsight can help you with Big Data?

    Big Data Many businesses these days face a problem with …Read More »
  • Why do I need Azure Storage

    Why do I need Azure Storage

    Azure Storage Whatever your business problem that relates to data, …Read More »
  • Why Machine Learning is your business

    Why Machine Learning is your business

    Today’s world got crazy about Artificial Intelligence, which in my …Read More »
  • My secret power to data intelligence

    My secret power to data intelligence

    Azure SQL Data Warehouse I already started covering the topic …Read More »

Categories

  • Architecture
  • Artificial Intelligence
  • Big Data
  • Business
  • Cloud
  • How to
  • Project Management
  • Tools

Newsletter

Signup for news and updates

Marketing Permissions

By clicking below to submit this form, you acknowledge that the information you provide will be processed according to Privacy Policy. Don’t worry, you can opt out any time.

Thank you!

You have successfully joined our subscriber list.

Footer

Site Pages

  • About
  • Blog
  • Contact
  • Home

Boring Stuff

  • Privacy Policy
  • Terms and Conditions

Search

© 2021 · spythesky.com

Spythesky.com use cookies and other tracking technologies for performance, analytics, and marketing purposes. By using this website, you accept this use. Find out more.