Temperstack
Main WebsiteFeaturesPricingBlogAbout usRequest a Demo
  • Overview
    • What is Temperstack?
    • Use Cases
  • User Managment
    • Getting started as Admin
      • Inviting Users
      • Mapping multiple services to a Team
      • Single Sign-On (SSO)
      • Customising ALCOM Audit & scanning
    • Getting Started as a User /Responder
    • Managing profile & contact details
  • Integrations
    • Integrating your Observability tools
      • Setting up AWS Integration
        • Multiple AWS Account Integration
        • IAM Setup Guide
          • Creating IAM User: Temperstack with Policy
          • Creating IAM Role: Temperstack with Policy
      • Setting up Microsoft Azure Integration
        • Creating Access for Temperstack in Azure
      • Setting up Google Cloud Platform Integration
        • Creating Access for Temperstack in GCP
      • Setting up Datadog Integration
        • Creating Access for Temperstack in Datadog
        • Managing resources with Datadog
      • Setting up NewRelic Integration
        • Creating Access for Temperstack in NewRelic
        • Managing resources with New Relic
      • Setting up Splunk Integration
        • Creating Access for Temperstack in Splunk
        • Managing resources with Splunk
      • Setting up Appdynamics Integration
        • Creating Access for Temperstack in Appdynamics
        • Managing resources with Appdynamics
      • Setting up Dynatrace Integration
        • Creating Access for Temperstack in Dynatrace
        • Managing resources with Dynatrace
      • Setting up Oracle Cloud Infrastructure
        • Creating Access for Temperstack in OCI
    • Integrating Custom Alerts & Other Alerting sources
      • Webhook Integration
      • Ingesting Emails as alerts
      • Integrating alert listeners from other observability tools
  • Alert routing & Response Managment
    • On-call scheduling and Escalation Policies
    • Setting up Services
    • Alert notification channels
      • Integrating Slack channels
      • Integrating MS Team
    • Mapping resources to Services
      • Rule based resource to Service Mapping
      • Using AI suggested mapping rules
    • Testing Alerting and Notifications
    • Responding to Alerts
  • Monitoring
    • Setting up and maintaining Comprehensive alerting
      • Alerting Templates- metrics & customisation
      • ALCOM and identifying monitoring gaps
      • Programmatically setting up missing alerts in your Observability tool
      • Alert noise Reduction & Optimisation
  • Uptime Monitoring
    • Real time Availability Monitoring
  • Incident analysis & communication
    • External and Internal service Status Pages
      • Instruction to migrate subscribers from Statuspage
  • AI-Powered Issue Resolution
    • AI powered contextual Runbooks
    • Incident command - alert grouping by incident
    • AI Powered Root cause Identification
  • Reporting & Governance
    • Temperstack Dashboard
    • SLO Dashboard
    • MTTA MTTR
  • Billing & Help
    • FAQs
    • Support
Powered by GitBook
On this page
  • On-call Policy as defined in Temperstack
  • Create On-call Schedule with the Following Steps
  • Edit On-call schedule with the following steps
  • Delete On-call schedule with the following steps
  • Final Escalation Policy
  • Example Scenario
  • Key Functions
  1. Alert routing & Response Managment

On-call scheduling and Escalation Policies

Last updated 4 months ago

On-call Policy as defined in Temperstack

In the Temperstack context, an On-call policy is linked to a specific team /group and has the roster of first-level responders on call and escalation policy when and to whom the notification should escalate.

Example

Consider a team consisting of Hari, Mohan, Haarvish, and ERA, who rotate weekly for on-call duties. Imagine an incident occurs on November 25 at 11:00 AM:

  • Level 1:

    • Hari is the primary on-call engineer and receives the alert first.

    • If Hari cannot acknowledge or resolve the issue within the designated time, it escalates to Level 2.

  • Level 2:

    • Haarvish is on duty at Level 2 during this time (from 10:00 AM, November 25, to 10:00 AM, November 26).

    • Haarvish takes over the incident if Hari does not respond.

  • Level 3:

    • If neither Hari nor Haarvish resolves the incident, it escalates to Level 3, where ERA takes responsibility.

Typically one set of people/team has one on-call Policy and can be mapped to multiple services, if the services are going to be responded to and escalated to the same team.

However, each service can have only one on-call Policy.

In the case of two services having the first responders but escalating to different persons, you need to define two different on-call policies which will be mapped to the respective service.

On-call schedules ensure that there are always team members available to handle any issues, including during nights, weekends, and holidays, ensuring continuous support. These schedules rotate among team members to distribute responsibilities fairly.

Escalation policies detail the procedure for escalating unresolved issues to higher levels of support. For example, if a server goes down, alerts are sent to the designated person on the escalation list, who is responsible for addressing the issue promptly.

Create On-call Schedule with the Following Steps

Step 1: Navigate to Temperstack Notifications at the top menu -> Click on On-Call Policies - The on-call policy primarily determines who will be contacted when an incident arises and outlines the escalation process for a particular service or group of services.

Step 2: To add or create a new on-call policy, navigate and click on the top-right side button - Add On-Call Policy - Here, you can create new rotations to escalate engineers if they fail to respond within the specified time duration.

Step 3: Enter “Policy Name” - This is the name of the group or team that can be assigned to the Temperstack services. The Start Date indicates when this policy was created.

Step 4: Select Your Time Zone - Choose the appropriate time zone for the policy.

Step 5: Next to the "Start Date," you'll find a "Repeat" option where you can select either "Yes" or "No." - This option determines whether the escalation schedule should repeat after the last user in the rotation has been reached.

Step 6: Click on "Add Rotation." - This action creates rotation levels to escalate on-call engineers within a given timeframe.

Step 7: After adding a rotation, you'll see "Levels" indicating the number of rotations, including the users added to this policy.

  • Escalate at: Denotes the time duration within which the alert is escalated to the next person if the first person does not acknowledge it.

  • Rotation Frequency: Choose how often the rotation should repeat:

    • Daily: The rotation resets every day.

    • Weekly: The rotation resets every week.

    • Custom: Define a custom rotation period.

  • Specify the Rotation Time:

    • Select the start and end times of the rotation.

    • Define the shift duration frequency (i.e., the time period for each shift).

  • On-Call Engineers:

    • Select the names of users to add to the rotation.

    • The on-call rotation will proceed in the order they are added, based on the rotation frequency.

Step 8: View the Rotation Graph Below the rotation box, you will find a graph that provides:

  • Date: Displays the timeline of the current day.

  • Hourly-based View: Shows user rotations in an hour-by-hour format, making it easy to visualize the shifts assigned to each on-call engineer.

Note: Only on-call engineers with verified numbers are eligible.

Once Level 1 is set up, you have the option to set up Level 2, if necessary, by clicking on "Add Rotation" and following the same steps as described above. If you do not require another level, click on "Submit" beside the "Add Rotation" button to finalize the schedule creation. The newly created on-call schedule will now appear on the list.

Edit On-call schedule with the following steps

  1. Click on the pencil icon located on the right-hand side of the list of all schedules.

  2. Edit the escalation policy as needed.

  3. Simply update it by hitting the "Submit" button.

Delete On-call schedule with the following steps

  1. Locate the dustbin icon on the right-hand side of the list of all schedules.

  2. Simply click on the dustbin icon to delete the policy.

Final Escalation Policy

The Final Escalation Policy serves as the last resort for ensuring incident resolution within Temperstack's incident management system. Here's a breakdown of the key components and how to configure them:

Calendar of Rotations

  • The timeline displays shifts assigned to engineers hour by hour over a selected date range.

  • Shifts are color-coded for clarity, distinguishing responsibilities at different levels (e.g., Level 1).

  • Rotation-based shift assignment enables the creation of a team member list, with shifts automatically generated by Temperstack in the order you have them listed.

Example Scenario

Consider a team consisting of Hari, Mohan, Haarvish, and ERA, who rotate weekly for on-call duties. Imagine an incident occurs on November 25 at 11:00 AM:

  • Level 1:

    • Hari is the primary on-call engineer and receives the alert first.

    • If Hari cannot acknowledge or resolve the issue within the designated time, it escalates to Level 2.

  • Level 2:

    • Haarvish is on duty at Level 2 during this time (from 10:00 AM, November 25, to 10:00 AM, November 26).

    • Haarvish takes over the incident if Hari does not respond.

  • Level 3:

    • If neither Hari nor Haarvish resolves the incident, it escalates to Level 3, where ERA takes responsibility.

Key Functions

  1. Timely Notifications:

    • Connect services to on-call schedules for prompt incident alerts to designated engineers.

  2. Sequential Escalation:

    • Configure alerts to escalate in a predefined order and time frame, ensuring prompt resolution by the appropriate personnel.

  3. Customization:

    • Tailor escalation policies to specific service requirements, optimizing incident management across diverse portfolios.

Implementing effective escalation policies empowers organizations to proactively address incidents, minimize downtime, and maintain service reliability, ultimately enhancing overall operational resilience.

Know more about Temperstack On-call and Scheduling Policy.

here
On-Call Policies > (Select any one Policy) Scroll down