6.5 KiB
6.5 KiB
1. Prerequisites
Before proceeding with the steps in this document, ensure your service has integrated Prometheus metrics collection. For details, refer to prometheus-metrics-intergration-guideline.md
2. Prometheus Alert Rule Configuration
2.1. Add prometheusrule.yaml to <helm-pkg>/templates.
Example:
Update the metrics configuration to your service name. See
freeleaps-ops/freeleaps/helm-pkg/metrics.
{{- /*
Copyright Broadcom, Inc. All Rights Reserved.
SPDX-License-Identifier: APACHE-2.0
*/}}
{{- if .Values.metrics.prometheusRule.enabled }}
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: {{ .Values.metrics.prometheusRule.name }}
namespace: {{ .Values.metrics.prometheusRule.namespace | quote }}
{{- with .Values.metrics.prometheusRule.labels }}
labels:
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
groups:
{{- with .Values.metrics.prometheusRule.rules }}
- name: {{ $.Values.metrics.prometheusRule.name }}
rules:
{{- range . }}
- alert: {{ .alert }}
expr: {{ .expr | quote }}
{{- if .for }}
for: {{ .for }}
{{- end }}
{{- if .labels }}
labels:
{{- toYaml .labels | nindent 12 }}
{{- end }}
{{- if .annotations }}
annotations:
{{- toYaml .annotations | nindent 12 }}
{{- end }}
{{- end }}
{{- end }}
{{- end }}
2.2. Add prometheusrule configuration to values.{alpha/prod}.yaml
Example:
See
freeleaps-ops/freeleaps/helm-pkg/metrics.
prometheusRule:
name: freepeals-prod-metrics
enabled: true # disable in alpha environment
namespace: freeleaps-monitoring-system
labels:
release: kube-prometheus-stack
rules:
- alert: FreeleapsMetricsServiceDown # Service down alert
expr: up{job="metrics-service"} == 0
for: 1m
labels:
severity: critical # severity: warning/info/critical
service: metrics-service # service name
namespace: freeleaps-prod # namespace of the service
annotations:
summary: Freeleaps Metrics service is down (instance {{ $labels.instance }}) # summary
description: Freeleaps Metrics service has been down for more than 1 minute. # description
runbook_url: https://netorgft10898514.sharepoint.com/:w:/s/FreeleapsEngineeringTeam/EUlvzumTsPxCpPAzI3gm9OIB0DCLTjQzzYVL6VsHYZFjxg?e=0dxVr7 # Runbook url
- alert: FreeleapsMetricsServiceHighErrorRate
expr: rate(http_requests_total{job="metrics-service",status=~"5.."}[5m]) > 0.1
for: 5m
labels:
severity: warning
service: metrics-service
namespace: freeleaps-prod
annotations:
summary: High error rate in freeleaps metrics service (instance {{ $labels.instance }})
description: Freeleaps Metrics service error rate is {{ $value }} errors per second.
runbook_url: https://netorgft10898514.sharepoint.com/:w:/s/FreeleapsEngineeringTeam/EUlvzumTsPxCpPAzI3gm9OIB0DCLTjQzzYVL6VsHYZFjxg?e=0dxVr7
1.3. Verify Alert Rule Configuration is Effective
You can see the newly added rules indicating they are effective
3. Add AlertmanagerConfig (Email Notifications)
3.1 Add AlertmanagerConfig
If there is no
AlertmanagerConfigin the namespace, create one. If it already exists, no action is required.
To create a new AlertmanagerConfig, refer to freeleaps-ops/altermanager/altermanager-config.yaml.
apiVersion: v1
kind: Secret
type: Opaque
metadata:
name: altermanager-email-credentials
namespace: freeleaps-prod # The namespace whose service alerts you want to configure
data:
password: cHducGNya3d0aXp5Z2RoZQ==
---
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: alertmanager-config
namespace: freeleaps-prod # The namespace whose service alerts you want to configure
spec:
receivers:
# - msteamsConfigs:
# - sendResolved: true
# text: '{{ template "msteams.default.text" . }}'
# title: >-
# {{ if eq .Status "firing" }}🚨 [FIRING] 🔥{{- else -}}🙌 [RESOLVED]
# 🍻{{- end -}}
# webhookUrl:
# key: webhook-url
# name: freeleaps-teams-webhook
# name: ms-teams
- emailConfigs:
- to: "icecheng@mathmast.com" # email recipient
from: "support@freeleaps.com" # email sender
smarthost: "smtp.freeleaps.com:465"
authUsername: "support@freeleaps.com"
authPassword:
name: "altermanager-email-credentials"
key: "password"
authIdentity: "support@freeleaps.com"
requireTLS: false
sendResolved: true
headers: # email Subject configuration
- key: Subject
value: '{{ if eq .Status "firing" }}🚨 Freeleaps Alert: {{ .CommonAnnotations.summary }}{{ else }}✅ Freeleaps Resolved: {{ .CommonAnnotations.summary }}{{ end }}'
html: |- # email content configuration
<h3><strong>{{ if eq .Status "firing" }}🚨 Alert: {{ .CommonAnnotations.summary }}{{ else }}✅ Resolved: {{ .CommonAnnotations.summary }}{{ end }}</strong></h3>
<p><strong>📝 AlertName:</strong> {{ .CommonLabels.alertname }}</p>
<p><strong>🔧 Service:</strong> {{ .CommonLabels.service }}</p>
<p><strong>🔧 Pod:</strong> {{ .CommonLabels.pod }}({{ .CommonLabels.instance }})</p>
<p><strong>🏷️ Severity:</strong> {{ .CommonLabels.severity }}</p>
<p><strong>{{ if eq .Status "firing" }}🔴 Status:{{ else }}🟢 Status:{{ end }}</strong> {{ .Status | toUpper }}</p>
<p>📝 Description: {{ .CommonAnnotations.description }}</p>
<p>📖 Runbook: <a href="{{ .CommonAnnotations.runbook_url }}">{{ .CommonAnnotations.runbook_url }}</a></p>
name: email
route:
groupBy:
- severity
groupInterval: 5m
receiver: email
groupWait: 5m
repeatInterval: 6h
3.2. Verify Configuration Success
Trigger an alert and check the pages below for alert data. If present, the configuration is successful.
3.3. Verify Email Notification Success
4. Teams Alert Integration
TODO





