AI Tool

AI-powered Runbook Generator

Web-based implementation of our revolutionary AI tool that generates incident response runbooks from natural language descriptions.

AI-powered runbook generation in action

1 of 4

Project Resources

Source Code
Private Repository

Created a public-facing version of our team's AI-powered system that automatically generates incident response runbooks by analyzing historical PagerDuty data, team communication patterns, and resolution strategies. This tool helped Shoreline customers and thousands of DevOps engineers understand how to resolve incidents by providing intelligent, context-aware guidance.

The system uses a combination of general purpose LLMs with domain-specific knowledge, datasets, and retrieval augmented generation to identify patterns in incident data, extract common resolution steps, and create standardized runbooks that can be customized for specific scenarios. These runbooks, integrated with our public web-based runbook database (which I also developed), allows users to search, filter, and access runbooks based on incident type, severity, and team practices.

Successfully reduced average incident resolution time by 60%1 and improved team coordination during critical outages. The tool directly contributed to Shoreline's eventual acquisition by NVIDIA.

Following Shoreline.io's acquisition by NVIDIA during my tenure with the company, this tool was brought offline (along with the rest of the website).

Key Features

  • LLM-powered analysis with retrieval augmented generation (RAG) for domain-specific incident knowledge
  • Automated runbook generation from historical PagerDuty data and communication patterns
  • Public web-based runbook database with search, filtering, categorization, editing, and export capabilities
  • Machine learning pattern recognition for incident type identification
  • Preset templates for common incident scenarios

Architecture & Implementation

Web Platform & Database

I developed a comprehensive public-facing platform for runbook management:

  • Public Runbook Generator: Web-based tool for generating runbooks from natural language descriptions
  • Runbook Database: Searchable catalog of hundreds of AI-generated and curated incident response runbooks
  • Advanced Filtering: Users can find runbooks by incident type, severity, technology stack, and team practices

Machine Learning Pipeline

Our team built a sophisticated ML pipeline combining general-purpose LLMs with domain-specific knowledge:

  • Data Ingestion: Automated collection from PagerDuty, monitoring tools, and team communication platforms
  • RAG Implementation: Retrieval augmented generation using curated incident resolution datasets
  • Pattern Recognition: Advanced algorithms identify common incident types and successful resolution patterns
  • Content Generation: LLM-powered natural language processing creates human-readable, actionable runbook steps
  • Validation & Testing: Automated testing ensures generated runbooks are accurate and effective

Impact & Results

The AI-powered runbook generator transformed incident response for hundreds of DevOps engineers:

  • 60% reduction in average incident resolution time
  • Hundreds of users accessing the public runbook database daily
  • Knowledge democratization by making expert incident response accessible to all team members
  • Tribal knowledge preservation by capturing and standardizing institutional knowledge
  • Accelerated onboarding by enabling new engineers to learn more effectively with AI-generated guidance
  • Strategic business impact by contributing to Shoreline's acquisition by NVIDIA

Project Details

Timeline

2 months

Role

Lead Full-Stack Engineer

Technologies & Skills

TypeScriptReactNext.jsOpenAIPagerDutyPostgreSQLGolangPythonAWSKubernetesDockerRedisJestPlaywrightpytest

© 2025 Gabe Wyatt. All rights reserved.

0%