A day in a life of an SD-WAN admin: “What in the world should I pay attention to in all these graphs ?!”
by Jim Meehan This week I joined Augtera Networks as Sr. Director of Product and Product Marketing. I’m not one to jump ship lightly – my tenures at Kentik and Arbor Networks were 7 and 11 years respectively. So transitions like these create an opportunity for me to stop for a moment and reflect on the industry’s past and future, and my own as well. I’ve spent my entire career in the world of infrastructure. Much of it relating to networks, but also systems and software. For as long as I’ve been immersed in it, the state of infrastructure has always been one step ahead of our ability to operate it as effectively as we’d like. I’m old enough to remember when it seemed impossible just to generate the data we needed to properly troubleshoot networks. The days when our best tooling was a packet sniffer on a cart in the corner of the data center.
An anomaly is defined by Oxford Dictionary as “a thing, situation, etc., that is different from what is normal or expected”. When you apply this definition to Networks we need to first determine the networking constructs on which determining what is not normal or expected is of business and operational significance. Then we need to determine the normal behavior of these networking constructs and what is not normal on an ongoing basis.
The question is not just a teaser. There is a fundamental reason for introducing this blog in this manner. Because any SD-WAN implementation is a vendor specific mix of standard and proprietary building blocks, breaking the symbiotic relationship that network engineers had with network technology in the past. Previously, network engineers were building an entire network stack that included connecting links to the router, configuring interface IPs, configuring IGP, BGP peers, dampening, timers, BFD, queues, MPLS LSPs with RSVP FRR or LDP, MPLS VPN route distinguishers, route targets, and many more.
Recently two high profile AWS outages disrupted service to many organizations. Augtera’s Live AI detected the outages before Amazon posted a notice about them.
When we first presented at SD-WAN Summit in 2019 about the applicability of AI/ML on SD-WAN networks, I have to admit that the essence of our pitch was coming from a strong intuition rather than practical experience in that space. This is because most of our production deployments at that time were in WAN and Datacenter segments. However, when I look back at those old slides today when the new edition SD-WAN and SASE 2021 is starting, I am really surprised by how well they resonate with our last two years of deployment in enterprise and MSP SD-WAN infrastructures.
This blog will describe a real example of how an organization uses Augtera machine learning to proactively detect environmental degradation before they adversely impact service. Machine learning and AI are not technologies that typically come to mind when you think of monitoring environmental conditions in a facility, however, they should be and this blog will highlight why. Augtera is reinventing the way organizations operate their networks. Augtera machine learning and AI enable organizations to proactively identify conditions where failure may soon follow. In this blog, we examine a real-world example of how Augtera machine learning prevented a facility outage, and describe the shortcomings with traditional monitoring systems.
NetOne Systems is participating in the ONUG fall 2021 proof of concept session tomorrow. We are showcasing a practical illustration of how machine learning can enhance NetSecOps workflows to better protect cloud infrastructure. For this proof of concept we have integrated Cloudflare, Augtera Network AI and NetOne Cloud Controller.
When we started Augtera we had a mission to leverage Machine Learning and deep analytics to transform the networking industry. Today we are launching our company and our industry-first Network AI platform – a major milestone towards that goal.