On-call scheduling and Escalation Policies

On-call Policy as defined in Temperstack

In the Temperstack context, an On-call policy is linked to a specific team /group and has the roster of first-level responders on call and escalation policy when and to whom the notification should escalate.

Example

Consider a team consisting of Hari, Mohan, Haarvish, and ERA, who rotate weekly for on-call duties. Imagine an incident occurs on November 25 at 11:00 AM:

Level 1:
- Hari is the primary on-call engineer and receives the alert first.
- If Hari cannot acknowledge or resolve the issue within the designated time, it escalates to Level 2.
Level 2:
- Haarvish is on duty at Level 2 during this time (from 10:00 AM, November 25, to 10:00 AM, November 26).
- Haarvish takes over the incident if Hari does not respond.
Level 3:
- If neither Hari nor Haarvish resolves the incident, it escalates to Level 3, where ERA takes responsibility.

Typically one set of people/team has one on-call Policy and can be mapped to multiple services, if the services are going to be responded to and escalated to the same team.

However, each service can have only one on-call Policy.

In the case of two services having the first responders but escalating to different persons, you need to define two different on-call policies which will be mapped to the respective service.

Know more about Temperstack On-call and Scheduling Policy here.

On-call schedules ensure that there are always team members available to handle any issues, including during nights, weekends, and holidays, ensuring continuous support. These schedules rotate among team members to distribute responsibilities fairly.

Escalation policies detail the procedure for escalating unresolved issues to higher levels of support. For example, if a server goes down, alerts are sent to the designated person on the escalation list, who is responsible for addressing the issue promptly.

Create On-call Schedule with the Following Steps

Step 1: Navigate to Temperstack Notifications at the top menu -> Click on On-Call Policies - The on-call policy primarily determines who will be contacted when an incident arises and outlines the escalation process for a particular service or group of services.

Step 2: To add or create a new on-call policy, navigate and click on the top-right side button - Add On-Call Policy - Here, you can create new rotations to escalate engineers if they fail to respond within the specified time duration.

Step 3: Enter “Policy Name” - This is the name of the group or team that can be assigned to the Temperstack services. The Start Date indicates when this policy was created.

Step 4: Select Your Time Zone - Choose the appropriate time zone for the policy.

Step 5: Next to the "Start Date," you'll find a "Repeat" option where you can select either "Yes" or "No." - This option determines whether the escalation schedule should repeat after the last user in the rotation has been reached.

Step 6: Click on "Add Rotation." - This action creates rotation levels to escalate on-call engineers within a given timeframe.

Step 7: After adding a rotation, you'll see "Levels" indicating the number of rotations, including the users added to this policy.

Escalate at: Denotes the time duration within which the alert is escalated to the next person if the first person does not acknowledge it.
Rotation Frequency: Choose how often the rotation should repeat:
- Daily: The rotation resets every day.
- Weekly: The rotation resets every week.
- Custom: Define a custom rotation period.

Specify the Rotation Time:
- Select the start and end times of the rotation.
- Define the shift duration frequency (i.e., the time period for each shift).

On-Call Engineers:
- Select the names of users to add to the rotation.
- The on-call rotation will proceed in the order they are added, based on the rotation frequency.

Step 8: View the Rotation Graph Below the rotation box, you will find a graph that provides:

Date: Displays the timeline of the current day.
Hourly-based View: Shows user rotations in an hour-by-hour format, making it easy to visualize the shifts assigned to each on-call engineer.

Note: Only on-call engineers with verified numbers are eligible.

Once Level 1 is set up, you have the option to set up Level 2, if necessary, by clicking on "Add Rotation" and following the same steps as described above. If you do not require another level, click on "Submit" beside the "Add Rotation" button to finalize the schedule creation. The newly created on-call schedule will now appear on the list.

Edit On-call schedule with the following steps

Click on the pencil icon located on the right-hand side of the list of all schedules.
Edit the escalation policy as needed.
Simply update it by hitting the "Submit" button.

Delete On-call schedule with the following steps

Locate the dustbin icon on the right-hand side of the list of all schedules.
Simply click on the dustbin icon to delete the policy.

Final Escalation Policy

The Final Escalation Policy serves as the last resort for ensuring incident resolution within Temperstack's incident management system. Here's a breakdown of the key components and how to configure them:

Calendar of Rotations

The timeline displays shifts assigned to engineers hour by hour over a selected date range.
Shifts are color-coded for clarity, distinguishing responsibilities at different levels (e.g., Level 1).
Rotation-based shift assignment enables the creation of a team member list, with shifts automatically generated by Temperstack in the order you have them listed.

Example Scenario

Consider a team consisting of Hari, Mohan, Haarvish, and ERA, who rotate weekly for on-call duties. Imagine an incident occurs on November 25 at 11:00 AM:

Level 1:
- Hari is the primary on-call engineer and receives the alert first.
- If Hari cannot acknowledge or resolve the issue within the designated time, it escalates to Level 2.
Level 2:
- Haarvish is on duty at Level 2 during this time (from 10:00 AM, November 25, to 10:00 AM, November 26).
- Haarvish takes over the incident if Hari does not respond.
Level 3:
- If neither Hari nor Haarvish resolves the incident, it escalates to Level 3, where ERA takes responsibility.

Key Functions

Timely Notifications:
- Connect services to on-call schedules for prompt incident alerts to designated engineers.
Sequential Escalation:
- Configure alerts to escalate in a predefined order and time frame, ensuring prompt resolution by the appropriate personnel.
Customization:
- Tailor escalation policies to specific service requirements, optimizing incident management across diverse portfolios.

Implementing effective escalation policies empowers organizations to proactively address incidents, minimize downtime, and maintain service reliability, ultimately enhancing overall operational resilience.

Last updated 6 months ago