top of page

AI-Powered Log &  Observability Platform

Proposed and evaluated SigNoz as a cost-effective, AI-powered observability platform to unify logs, metrics, and traces in a single pane.

 

Focused on improving developer troubleshooting efficiency through fast querying, receiving detailed insights into logs, device configurations, seamless trace-to-log correlation, anomaly detection, and pattern recognition.
 

mac desktop 1.png
Screenshot 2026-01-16 at 10.55.14 PM.png

The Problem: Isolated Logs & Limited Observability

Isolated Logs: 

  • NSO logs (device trace, Java, Python, northbound logs) are stored in individual VMs only and if a VM crashes, its logs are lost, making recovery and analysis difficult. 

​​

  • No centralized platform to access and store logs from across VMs.

Limited Observability: 

  • The current observability setup restricts logs & device configurations exports, preventing quick query from logs to identify root-cause and this slows down root-cause analysis and troubleshooting. 

​

  • No metrics on detailed NSO logs preventing improving performance (E.g. Exceptions Monitoring, Device Performance)

Screenshot 2026-01-17 at 12.40.55 PM.png

My Role

  • Developed a deep understanding of Cisco NSO workflows and logging by exploring NSO flows, YANG models, NetConf/RestConf APIs, and log structures to identify observability gaps.

 

  • Defined current and long-term observability requirements, focusing on unified visibility across logs, metrics, and traces with AI-assisted insights.

 

  • Researched and compared multiple observability platforms, evaluating usability, integration effort, performance, and cost trade-offs.

​

  • Conducted a hands-on prototype setup of SigNoz, integrating NSO logs, extracting attributes for feature engineering, and configuring metrics using SQL and exception alerts to evaluate troubleshooting workflows and operational feasibility.

​

​

​

​

​​

​

​

​

​

​​

​

​

​​

​​

​​​​​​​​​​​

​

​

​

  • Evaluated ingestion latency, query performance, memory usage, and estimated cost impact.

Impact & Metrics

Cost Impact

Projected observability cost savings: ~46% with SigNoz Cloud; up to 90% with self-hosted open-source deployment.

ROI Assessment

Estimated high ROI (up to 9× vs. other observability platforms) based on ingestion volume, retention, and infrastructure overhead modeling. 

Operational Efficiency

Availability of all logs and unified logs, metrics, and traces enabled faster root-cause analysis, cutting time-to-resolve by over 50% and reducing reliance on reactive customer-reported incidents.

Developer Productivity

Centralized querying and trace-to-log correlation significantly improved troubleshooting efficiency and reduced time spent context-switching across tools.

System Reliability

Improved visibility into exceptions and performance trends helped lower risk of blind spots and data loss in distributed systems.

Key Learnings

  • Observability is more than monitoring: Metrics alone are insufficient; combining logs, traces, and metrics is critical for effective root-cause analysis in distributed systems like Cisco NSO.

 

  • Data modeling directly impacts performance and cost: Log structure, attribute extraction, and indexing strategy significantly affect ingestion latency, query speed, memory usage, and overall observability spend.

 

  • Prototype validation reduces adoption risk: Hands-on prototype setups uncover integration complexity, performance bottlenecks, and operational trade-offs that are not evident from documentation or vendor comparisons alone.

 

  • Cost grows non-linearly with scale: Log volume, retention policies, and cardinality are primary cost drivers, making open-source and self-hosted options attractive for high-throughput systems.

 

  • Unified observability improves developer efficiency: Trace-to-log correlation and centralized querying reduce context switching and enable earlier issue detection before customer impact.

 

  • Product decisions require technical depth: Effective platform evaluation requires balancing usability, scalability, performance, and cost, not just feature completeness.

bottom of page