Real-time NLP: A New Era in Anomaly Detection for Logs

Introduction 

Augtera Networks is differentiated from other vendors implementing AI/ML by developing its own high-performance, high-efficiency implementation, not relying on off-the-shelf libraries. The velocity, variety, and volume of operations data continues to grow at a rapid rate. Yet, Enterprises generally have limited IT resources. Off-the-shelf AI/ML does not understand networking constructs, have the necessary performance, or reduce the number of resources needed to get to an outcome. 

When it came to implementing natural language processing (NLP), Augtera took the same approach, spending a year to develop the highest-performing and lowest resource usage algorithms, that is also implemented with high performance software technology. Algorithm design is just as important, if not more so, than the choice of software technology. 

Artificial Intelligence vs Text Matching 

NLP is a broad area of AI, including parts-of-speech tagging, statistical language modeling, syntactic analysis, semantic analysis, sentiment analysis, information retrieval, and vocabulary. Augtera’s initial focus is similarity analysis, which will allow operations teams to sleep better, knowing they will be aware of new log messages and patterns, as soon as they occur, instead of finding out weeks later, after numerous failures have occurred. 

Augtera’s similarity analysis detects subtle variations in messages that are more typical of the capabilities humans have, than what is easily achievable and maintainable with technologies like regex. Moreover, from a practical perspective, operations teams cannot set up a regex for a message that has never been seen before. The ability of NLP, and specifically Augtera’s ability to apply human like judgement to similarity, is why this capability is recognized as being part of the artificial intelligence family. 

Example Messages & Results

The best way to demonstrate the capability is to provide some examples. 

  1. bgp_recv: read from peer aaa.bbb.ccc.ddd [External AS ZZZZ] failed Broken Pipe 
  1. bgp_recv: read from peer aaa.bbb.ccc.ddd [External AS ZZZZ] failed Unknown Error: XXXX 
  1. rt_pfe_veto: Memory usage of M_RTNEXTHOP type = (0) Max Size possible for M_RTNEXTHOP type = (8332584960) Current delayed unref = (4281), Current unique delayed unref = (4000), Max delayed unref on this platform = (4000) 
  1. rts_veto_net_delayed_unref_limit: Memory usage of M_RTNEXTHOP type = (10144064) Max size possible for M_RTNEXTHOP type = (8332146688)  Current delayed unref = (6000) Max delayed unref on this platform = (6000) 
  1. task_addr_local: task MSDP.62.40.124.193 address 62.40.124.192: Can’t assign requested address 
  1. task_addr_local: task RV.83.97.94.109+8282 address 62.40.96.1: Invalid argument 

Using Augtera’s similarity analysis, from the above: 

  • Messages 1 and 2 would be considered similar 
  • Messages 3 and 4 would be considered similar 
  • Messages 5 and 6 would be considered NOT similar 

In the case of messages 5 and 6, the protocols (MDP & RV) are very different as is the textual explanation of the error: can’t assign requested address vs invalid argument. In the case of messages 3 and 4, they are both referring to, likely related, memory usage. Messages 1 and 2 are clearly similar. 

Remember, this is not a rules-based approach to understanding similarity, it is based on algorithms and machine learning.  

Zero Day Analysis 

Augtera’s log analysis will allow network operations teams to observe AI-detected new messages as soon as they occur, which is why we are calling the capability Zero Day Anomalies. The first time a new message occurs, network operations teams will know about it, they can do root cause analysis, and then they can decide if they want to create a classifier that generates a trouble ticket on future appearances of the message. With the Augtera platform, classifiers can be created and activated in a matter of minutes. Some DIY solutions can take months as a software engineering resource has to be scheduled. In addition, operations teams can do rate-based anomaly detection on new anomalies, when rates of a message deviate from normal. 

Conclusion 

What Augtera has realized is not an AI POC, achieved with unlimited resources. Network operations teams often have significant limitations on how many compute/memory resources they can apply to a platform. Augtera’s implementation is scalable to hundreds of millions of messages per hour without loss, with a low CPU/memory footprint. This was the value of creating our own high-performance / high-efficiency implementation.  

Augtera’s high-performance, high-efficiency zero-day capability is an industry-first for network operations teams. It is a new era in using real-time NLP for log analysis, with more innovation to come.