With businesses now able to capture, store and process an incredible amount of data about almost everything, managing and using that data to improve business performance becomes critical. Data is what drives machine learning models, and the more data you have, the better. It’s maybe no surprise, then, that the big cloud vendors started investing in data warehouses and data lakes early on. But that’s just a first step. After that, you also need the analytics tools to make all of this data useful. Microsoft Azure Analytics Services in the cloud provides a cost effective and powerful platform to build, manage and deploy applications to manage business intelligence, advanced analytics and Big Data solutions. Here we explore how to use the components of Azure for making the most of enterprise Big Data.
A Recap On What Big Data Is
Big data is the term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. Usually characterised by analysts around:
- Volume – the amount of data to collect and store from varying sources such as business transactions, social media and information from sensor or machine-to-machine data, IOT.
- Velocity – the speed at which data streams. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near real-time
- Variety – the different types of formats data comes in – from structured numeric data in traditional databases to unstructured text documents, email, video, audio, stock ticker data and financial transactions.
Big Data flows can be highly inconsistent with periodic peaks which makes data loads challenging to manage, especially with unstructured data. The fact that data also comes from multiple sources, makes it difficult to link, match, cleanse and transform across systems. It’s necessary to connect and correlate relationships, hierarchies and multiple data linkages or your data can quickly get out of control.
Sources for Big Data generally fall into one of three categories:
- Streaming Data – this is data that reaches your IT systems from a web of connected devices. You can analyse this information as it arrives.
- Social Media Data – this is data on social interactions, particularly relevant for marketing, sales and support functions. It’s often unstructured or semi structured so often poses a challenge.
- Publicly Available Sources – Massive amounts of data are available through open data sources, usually used to provide richer context to analysis.
It is important to remember that the primary value from Big Data comes not from the data in its raw form, but from the processing and analysis of it and the insights, products and services that emerge from analysis. The sweeping changes in Big Data technologies and management approaches need to be accompanied by similarly dramatic shifts in how data supports decisions and product/service innovation.
The main barriers in the past to adoption of Big Data Platforms in Australia has been factors such as costs, lack of in-house skills and IT infrastructure requirements. Microsoft has tackled these head on to provide a compelling answer to each of these.
What Does Microsoft Azure Offer In Analytics Services For Big Data?
The Big Data components in Microsoft Azure let you build solutions which can process billions of events, using technologies you already know. Building this solution on Azure requires the deployment of a suite of complementary product technologies which integrate seamlessly and collectively to create a comprehensive Big Data offering. Microsoft was recognised as a Leader by Gartner in its 2019 Analytics and Business Intelligence Magic Quadrant Report for the twelfth year in a row. Mainly due to its comprehensive and visionary product roadmap and intuitive user experience. To understand why lets look at the capabilities required to build a Big Data solution in Azure. Essentially there are four steps that should be considered.
1. Pulling Together The Data Sources
Any Big Data solution starts with data sources. To build a solution, large volumes of data need to be sourced and stored for the necessary processing of the consolidated datasets. Data sources can be both structured and unstructured and can be sourced from anywhere. To illustrate this let’s take the example of a real-time traffic management system. Data sources could be video surveillance data, sensor data installed on the actual road network, and even GPS data from vehicles using the road network. Big Data solutions need a vast amount of related data from different sources to build accurate models. Microsoft’s Azure Data Factory is used for automated data movement, transformation & integration in the platform.
2. Integration and Data Storage
When the data sources are identified, they need to be processed and stored. Azure has a wide variety of integration and data storage solutions to meet the diverse needs a Big Data solution requires. As each Big Data solution is unique, the right set of technologies need to be chosen to align with the solution being built.
Microsoft Azure HDInsight is Microsoft’s Big Data solution and is a 100% Apache Hadoop-based service in the Azure cloud. It is a fully managed cloud service making processing massive amounts of data easy, fast, and cost-effective allowing you to use widely accepted Big Data open source frameworks like Hadoop, Spark, Hive, and R among others.
HDInsight amalgamates both the integration and data storage services needed for a Big Data solution and as such is the preferred platform for building these types of solutions. It is a native-cloud solution which is globally available and meets the necessary measures for security and compliance. It also allows you to use a variety of productivity tools ranging from Microsoft Visual Studio to Eclipse and IntelliJ and supports the Scala, Python, R, Java, and .Net platforms.
In addition, to HDInsight, Azure offers a wide range of integration services which can be used to build Big Data solutions. These range from the standard SQL Server Integration Services to a wide variety of other Azure Integration Services including Service Bus. Also, Azure offers specialist integration solutions such as Logic Apps and Event Hubs which are services purposely built for integrating IoT Big Data solutions.
Microsoft Azure has a wide range of data storage solutions which can be used as the data store for Big Data solutions. These solutions range from Azure SQL Database which extends to a full data warehousing solution with SQL Data Warehouse. Data Lake Storage helps with managing this variety of data since it can handle both structured and unstructured data (and is optimised for the Spark and Hadoop analytics engines). The service can ingest any kind of data — yet Microsoft still promises that it will be fast.
3. Data Models and Analytics
Once the Big Data solution’s data storage and integration services are defined and implemented, the next step is to perform analysis using data models and analytics.
Azure’s range of offerings for analytics is vast with many different services dedicated to analytics, artificial intelligence, and IoT. Naturally, one would not use all on a specific Big Data analysis solution. As mentioned previously, Big Data solutions consist of a suite of relevant technologies which are integrated to form a solution platform. So, the analysis service you choose depends entirely on what type or form of analysis you are performing on the collected data.
Azure Analysis Services is Microsoft’s enterprise-grade analytics engine as a service for generic analysis services. Log Analytics can collect, search, and visualise machine data from on-premises and cloud services whereas Stream Analytics analyses real-time data streams from IoT devices. If your solution requires an Apache Spark-based analytics platform, Azure Databricks would be the right choice. Azure Data Bricks is an analytics service and architecture based on Apache Spark where data is prepped, trained and processed for big data analytics, artificial intelligence solutions (search mining, machine learning, AI apps & agents) and real-time analytics from streaming data (Data Explorer). Data Lake Analytics can run massive parallel processing programs in a variety of coding languages over petabytes of data stored in Azure Data Lake.
The services mentioned are just a few of the many different types of analysis services available on Microsoft Azure. As Big Data is such a wide and varied field, you need to tailor the analytics service you choose to the solution you have created. With Azure, these choices, options, and variations are endless.
4. Visualisation and Reporting
The final piece you need to complete a Big Data solution is the visualisation and reporting platform. As with other parts of a Big Data solution, there are numerous options available, and you need to choose the services which best align with the objectives of your solution.
Azure, and Microsoft, has a variety of reporting and visualisation tools for this purpose. You could opt to display reports using SQL Server Reporting Services for structured reporting or for more interactive self-serve dashboarding Microsoft Power BI is the preferred option. Power BI also provides an embedded analytics capability in Azure for context insights within existing applications. Using Microsoft’s Common Data Model, Power BI users can analyse data using a standard data schema without performing complex data transformation. The type of analytics available includes the full spectrum from descriptive, to prescriptive, predictive and advanced analytics models built in Azure.
In addition to the capabilities outlined above it is also worth noting that Microsoft Azure is a reliable cloud analytics platform with the most comprehensive compliance offerings, with more certifications than any other cloud vendor combined with advanced identity governance and access management with Active Directory integration. It also boasts a 99.95% availability SLA and 24×7 tech support. While GigaOm’s latest report indicates that the Azure SQL Data Warehouse is now 14 times faster and is 94% cheaper than other cloud analytics platforms available in the market. This price performance extends to the rest of the Microsoft Azure analytics stack including Azure Data Lake and Azure Databricks.
If you’d like to know more, then this technical ebook gives three practical use cases with Microsoft Azure Databricks to solve Big Data and AI challenges, with the example of a churn analysis model, a movie recommending engine and an intrusion detection demonstration.