So Many Events Which One Is Important
Let’s say any given morning, you can take one of two paths out of your neighborhood. Should you take the scenic route through the back roads or risk the traffic on the highway to save a few minutes? But, what if you don’t really have a choice one morning? What if there was an accident on the highway and you must take the back roads? The accident on the highway left you with the proverbial single point of failure.
In networks we deal with the same thing. It’s the number of events that is overwhelming for operators to triage and identify the impact. Some networks experience thousands of link down events every hour. The challenge is determining when a site is down or lost redundancy. Now to help make our lives simpler, we come up with fancy interface descriptions like “Critical A-Path to Data Center”. So an operator knows that interface is critical and it’s the “A-path”. However, missing in all the events from syslogs and SNMP traps are these descriptions and worse, not all interface descriptions line up.
Augtera To The Rescue
Augtera continuously searches through the set of paths from a device or group of devices based on the network topology that has been dynamically discovered or ingested through the Topology API. In many cases, customers leverage both methods to fully complete the topology.
Events are detected using SNMP traps or syslog messages or gRPC/gNMI for real-time changes to the topology. Link down or protocol neighbor relationship events are leveraged for updating the topology. Link down results in a change to the topology where a device or set of devices either has no connectivity or loses a path.
To bring it all together, Augtera uses metadata to treat a group of devices in the topology as a set. If either the left or right link exiting the box (set of devices) goes down, then Augtera sends a notification that this set of devices has a single point of failure. Augtera can also notify if the connections or paths within the set result in a device being isolated, for example lab-ex4300-01’s connections to the L3_ASR and LAB-SR are down.
The assessment for single points of failure is continuous. Here is an example from a production deployment, where Augtera detected that an event triggered a loss of redundancy.
The notifications are deduplicated so as to avoid a continuous stream of alerts being sent.
Augtera provides the capability to both configure and verify the intent of the network by analyzing the impact of link state events and the resulting loss of redundancy or connectivity. Reports for non-conformance would mirror the type of alert shown above based on the behavior of the device.
To schedule a 30-minute discussion with an engineer on how Augtera Network AI can help with your network challenges please contact us. Thanks for reading our blog.
Click this link to learn more about Augtera Network’s Network AI.