The increasing complexity of enterprise IT requires CIOs and VPs of Operations to employ sophisticated tools for operations management. This is particularly true for Hybrid IT operations that may encompass multiple cloud environments as well as on-premise datacenters. Human monitoring alone is not enough to stay on top of today’s enterprise infrastructure, particularly with a lean staff.
Artificial Intelligence for IT Operations (AIOps) has emerged as a key technology for managing this complexity and helping executives hit their internal SLAs. AIOps connects data from your entire infrastructure – cloud and on-premise – and applies machine learning to simplify monitoring, identify performance problems, and prescribe pro-active maintenance solutions. The goal is to detect and solve problems before applications and end-users are affected.
Micro Focus Operations Bridge (OpsBridge) provides pervasive monitoring, a centralized Data Lake based on Vertica technology, and machine learning algorithms to support an AIOps solution that enterprises can rely on to handle the complexity of Hybrid IT. This post examines how Micro Focus OpsBridge delivers AIOps to help you lower operating costs, reduce downtime, and deliver on your SLAs.
What is AIOps?
Gartner defines AIOps platforms as software systems that "combine big data and AI or machine learning functionality to enhance and partially replace a broad range of IT operations process and tasks, including availability and performance monitoring, event correlation and analysis, IT service management, and automation."
The promise of AIOps is not simply better monitoring with more sophisticated rules. Instead, the goal is a machine-based correlation of events, diagnosis, and intelligent remediation. To illustrate the difference, a monitoring system can alert your IT staff that “server X, at such-and-such IP address, supporting e-commerce transactions, is experiencing low throughput”. While that is useful information, it still requires your staff to investigate and figure out if this is an isolated incident or part of a larger problem. In contrast, an AIOps system could tell you much more, such as: “Online checkout failures are spiking because of a throughput problem at server X. Previous observations of this condition have been followed within 90 minutes by ecommerce system downtime. Remediate by failing over to a backup server and rebuilding the primary.”
See the difference? It is enormous. The AIOps platform tells you that the issue with server X has come up before and it usually precedes a complete failure of the e-commerce application. It even suggests a remediation step – failing over to a backup server and rebuilding the primary. Having this information saves your IT staff a lot of digging and tells them how they can avoid downtime in the short term.
Notice that the AIOps system doesn’t necessarily know why server X is having problems, or why these problems may lead to e-commerce downtime. Figuring out the “why”, in many cases, is still going to take some human ingenuity and an understanding of how the e-commerce application works. But, having an AIOps system detect the correlation and provide a short-term fix is still extremely helpful.
This example illustrates what Gartner is getting at when they indicate that AIOps can “partially replace a broad range of IT operations processes and tasks”. Here, the need for manually reviewing logs and discovering the danger to the e-commerce application was completely avoided by leveraging AI that can correlate events. In this respect, AIOps is a natural outgrowth of monitoring.
AIOps is a Natural Outgrowth of Monitoring
Over the last decade, we have seen the rise of monitoring tools that can track data about almost every aspect of an IT operation. Data can be gathered from log files, event monitors, APIs, packet sniffing, and from integration with special-purpose monitoring tools for networks, virtual machine environments, cloud providers, etc. According to InformationWeek, “the vast quantities of machine data generated by infrastructure hardware and software is too much for humans to analyze in a timely and cost-effective manner.”
That’s why Gartner envisions AIOps as the natural next step in IT operations automation. While monitoring tools can handle the acquisition and some aggregation of operations data, artificial intelligence is required to aggregate, standardize, and analyze this torrent of data.
With AIOps taking the grunt-work out of aggregation, correlation, and predictive modeling, human effort is freed up to apply our knowledge of the operations environment to figure out what needs to be done to solve problems, and then act to solve them.
Getting back to the server example, the AIOps platform can detect a correlation between a server issue ande-commerce application downtime. That is valuable information. But from there, human intelligence is required to figure out the “why”. Understanding why the application goes down and how it may be related to server issues requires application domain knowledge and some sleuthing around to uncover the underlying issues. AIOps takes the grunt work out of analysis and helps your staff find time to investigate underlying design problems and solve them.
What Problems are Solved with AIOps?
Many challenges arise from the complexity of today’s Hybrid IT environments. For example, midsize to large companies now use (on average) eight different cloud service providers, according to a survey by IHS Markit Ltd. In these organizations, the amount of information that needs to be monitored exceeds the capacity of the IT operations staff. This information overload presents a set of problems that can be solved by AIOps systems.
At a basic level, AIOps can correlate and deduplicate monitoring messages. Today’s environments are highly instrumented, and a single event like a CPU utilization exceeding a threshold might generate multiple different log entries and monitored events. AIOps can correlate and deduplicate these multiple entries so that the monitoring system presents only one notification of the CPU threshold event. Similarly, all logging and monitoring contain a lot of noise that can be safely ignored. An AIOps system can be trained to suppress the noise and only present significant, deduplicated, events to the IT staff doing the monitoring.
Beyond eliminating the grunt work of cleaning up the logs, an AIOps system can automate historical analysis. One component of an AIOps system is a massive data store, sometimes referred to as a Data Lake, that keeps a historical record of all significant monitoring logs and events. So, if your operations staff notices, for example, that VDI latency is exceeding acceptable levels, they can use the AIOps system to quickly scan the historical records to find other times when this kind of latency has occurred. Looking at the historical record may yield insight into what caused the problem in the past and what actions could resolve the situation.
Taking this historical analysis a step further, the AIOps system can correlate these latency events with other historical observations. Such a correlation analysis may reveal a certain network configuration that was present at each time the latency spiked. Armed with this information, the IT operations staff can look at that configuration, determine how it is impacting VDI latency, and take corrective action. This kind of event correlation and analysis can be performed by an AIOps system quickly, providing IT staff with insights in near real-time that can enable them to act to resolve critical operations problems.
Going even further, the machine learning algorithms embedded in an AIOps system can learn from these historical correlations. An AIOps system uses the historical data to train its machine learning component to recognize monitored events or anomalies that indicate when a system failure is likely to occur. This kind of predictive analytics can be used to determine when preventative maintenance is applied to rebuild systems, reboot servers, add additional memory, or take a host of other actions to keep operations running smoothly and avoid downtime.
Gartner uses the following chart to illustrate how AIOps supports continuous IT Operations Management (ITOM).
Real-time and historic information enters the AIOps platform from a variety of sources and in a variety of data types including logs, metrics, wire data, and document text. The machine learning algorithms at the core of the system processes this data to provide historical analysis, anomaly detection, performance analysis, and correlation and contextualization that drive efficient operations management and preventative maintenance.
An Overview of Micro Focus Operations Bridge
Micro Focus Operations Bridge (OpsBridge) is an enterprise platform for IT Operations Management (ITOM). It monitors your entire infrastructure, gathering raw operations data and consolidating it with data from existing tools (e.g., network and virtual machine management platforms). OpsBridge provides automated discovery, so it can learn the topology of your infrastructure including traditional, private, public, multi-cloud, and container-based environments. Combining extensive data gathering with a detailed model of your infrastructure, OpsBridge provides unparalleled monitoring capabilities.
On top of this monitoring capability, OpsBridge provides AIOps capabilities built on big data and machine learning technology. The OpsBridge AIOps capabilities can surface important events, automate root cause identification and remediation, and direct action to resolve critical situations and minimize downtime. In addition, within the OpsBridge platform, the AIOps capabilities drive analytics and reporting via customizable dashboards that provide critical status updates, operations metrics, business KPIs, and actionable insights.
OpsBridge has been designed to support the needs of Hybrid IT environments requiring ITOM automation and AIOps support for both on-premise and cloud infrastructure. The AIOps capabilities consolidate and analyze data from both cloud and on-premise resources, providing a holistic view of your operations.
As illustrated in the diagram below, from a Micro Focus data sheet, OpsBridge centralizes data from myriad data sources, providing a single source of ITOM data.
To provide this service, OpsBridge integrates with probes and data collectors across your entire infrastructure, providing the information you need to measure service quality, control SLAs, and analyze application health. All this data is ultimately stored in a Data Lake where it can be processed by machine learning algorithms to support AIOps analytics and drive reporting and dashboards.
Micro Focus OpsBridge Supports AIOps with Vertica
At the center of the OpsBridge data collection, storage, analytics, and machine learning infrastructure is a Data Lake powered by Vertica technology. Micro Focus describes this as a Collect Once Store Once (COSO) Data Lake. The COSO Data Lake can receive high-volume and high-velocity data from multiple independent sources throughout your IT infrastructure. The Vertica platform can handle the storage of this data in all the formats (e.g., logs, metrics, wire data, text) and provides fast lookup, powerful query tools, and flexible reporting integration. Furthermore, in addition to providing a high-performance data store, Vertica includes advanced analytics and in-database machine learning that support many of the OpsBridge AIOps features.
OpsBridge leverages this Vertica foundation, providing additional machine learning modules and an ITOM Container Deployment Foundation (CDF) that receives and processes operations data from your entire Hybrid IT environment before passing it along to Vertica storage.
CDF is built to deploy easily on a wide range of environments. Built on modern container technologies like Docker and Kubernetes, CDF runs on bare metal, virtual environments, and in the cloud. This technology is designed to scale up to large Hybrid IT deployments and provide a manageable ITOM data collection capability to support OpsBridge in the most challenging enterprise environments.
IIS Can Help You Get Started with OpsBridge
International Integrated Solutions (IIS) is a Micro Focus Platinum Partner with extensive experience in IT Operations Management in both medium and large enterprises. As experts in AIOps, IIS can design, install, and help maintain an OpsBridge solution that leverages the power of machine learning to optimize performance, reduce cost, and minimize downtime. IIS can install this advanced technology, apply it effectively in your environment, and train your team to manage everything.
IIS is also a distinguished HPE partner, winning HPE Global Partner of the Year in 2016 and Arrow’s North American Reseller Partner of the Year in 2017. They have worked with OpsBridge for many years prior to 2017 when it was part of the HPE SW portfolio and have continued to actively partner with Micro Focus after the Micro Focus/HPE SW merger.