What is Azure Data Catalog ? Well that’s a very good question. I will try to answer it so bare with me for few words.
Why I need data repository?
Azure Data Catalog is a service used to create a metadata catalog. So it is a single place of truth. Where you can find information about your assets. I just want to highlight it that Data is a business asset. It should be treated like one. This will provide you with opportunities to increase your business and optimize processes. Data is a lifeblood for your company. Optimization means that you no longer spend countless hours searching for data because it will be in one place. It is securely managed and categorized to simplify research.
Data catalog is not just a database with information. It is a collaboration tool. Your teams can use it to share data knowledge so that local tacit information can be available to everyone in organization. This tool is a nice addin to the solution I was writing about (Cortana Intelligence Suite)
Who is it for?
Essentially everyone who will benefit from data repository. Most likely it will apply to analysts, data scientist, developers. This will allow them to consume data sources. Getting benefits and saving time to make decisions. This will have direct impact on business managers. They will be making correct decisions if information about systems and business logic will be up to date and current. Having a single point working as a encyclopedia improved data quality.
Which problem it solves?
If you want to find out which data sources do you have in organization. You have to send tons of emails to various people in organization to get this information. What I face on daily basis is lack of email addresses. I don’t know who is managing what systems. There is no single repository where this information would be kept.
It will solve the problem of different organization silo, where important systems and processes are only local knowledge. In azure they call them tribes. Different teams hold knowledge in their heads, which can only be open upon request. You will be able to do stocktaking of all registered resources.
The biggest problem Azure Data Catalog solves is synchronization. If there is local knowledge base it is almost never synchronized and updated so that everyone will be up to date with changes. I would like to see a business where documentation is updated on regular basis 🙁
How it works?
Once you register your application metadata is being copied to Azure Data Catalog. It keeps reference to local application resources. Data Catalog plase index on metadata making is fast to search.
When data resides in Data Catalog users can annotate data providing description or tags. It can contain documentation about processes.This will help organization discover data sources and their use. Search feature allows to quickly find relevant information. Users can access catalog and provide relevant information. You can use excel to connect to data catalog to view information.
Nice step by step guides you can find on Azure site.
Let me know if you find this useful or what particular information you may need.?