Longhorn,企业级云原生容器分布式存储之监控
发布时间:2021-11-03 13:46
来源:黑客下午茶
阅读:0
作者:为少
栏目: 云计算
欢迎投稿:712375056
本文档提供了一个监控 Longhorn 的示例设置。监控系统使用 Prometheus 收集数据和警报,使用 Grafana 将收集的数据可视化/仪表板(visualizing/dashboarding)。
目录
-
设置 Prometheus 和 Grafana 来监控 Longhorn
-
将 Longhorn 指标集成到 Rancher 监控系统中
-
Longhorn 监控指标
-
支持 Kubelet Volume 指标
-
Longhorn 警报规则示例
设置 Prometheus 和 Grafana 来监控 Longhorn
概览
Longhorn 在 REST 端点 http://LONGHORN_MANAGER_IP:PORT/metrics 上以 Prometheus 文本格式原生公开指标。有关所有可用指标的说明,请参阅 Longhorn's metrics。您可以使用 Prometheus, Graphite, Telegraf 等任何收集工具来抓取这些指标,然后通过 Grafana 等工具将收集到的数据可视化。
本文档提供了一个监控 Longhorn 的示例设置。监控系统使用 Prometheus 收集数据和警报,使用 Grafana 将收集的数据可视化/仪表板(visualizing/dashboarding)。高级概述来看,监控系统包含:
-
Prometheus 服务器从 Longhorn 指标端点抓取和存储时间序列数据。Prometheus 还负责根据配置的规则和收集的数据生成警报。Prometheus 服务器然后将警报发送到 Alertmanager。
-
AlertManager 然后管理这些警报(alerts),包括静默(silencing)、抑制(inhibition)、聚合(aggregation)和通过电子邮件、呼叫通知系统和聊天平台等方法发送通知。
-
Grafana 向 Prometheus 服务器查询数据并绘制仪表板进行可视化。
下图描述了监控系统的详细架构。
上图中有 2 个未提及的组件:
-
Longhorn 后端服务是指向 Longhorn manager pods 集的服务。Longhorn 的指标在端点 http://LONGHORN_MANAGER_IP:PORT/metrics 的 Longhorn manager pods 中公开。
-
Prometheus operator 使在 Kubernetes 上运行 Prometheus 变得非常容易。operator 监视 3 个自定义资源:ServiceMonitor、Prometheus 和 AlertManager。当用户创建这些自定义资源时,Prometheus Operator 会使用用户指定的配置部署和管理 Prometheus server, AlerManager。
安装
按照此说明将所有组件安装到 monitoring 命名空间中。要将它们安装到不同的命名空间中,请更改字段 namespace: OTHER_NAMESPACE
创建 monitoring 命名空间
-
apiVersion: v1
-
kind: Namespace
-
metadata:
-
name: monitoring
安装 Prometheus Operator
部署 Prometheus Operator 及其所需的 ClusterRole、ClusterRoleBinding 和 Service Account。
-
apiVersion: rbac.authorization.k8s.io/v1
-
kind: ClusterRoleBinding
-
metadata:
-
labels:
-
app.kubernetes.io/component: controller
-
app.kubernetes.io/name: prometheus-operator
-
app.kubernetes.io/version: v0.38.3
-
name: prometheus-operator
-
namespace: monitoring
-
roleRef:
-
apiGroup: rbac.authorization.k8s.io
-
kind: ClusterRole
-
name: prometheus-operator
-
subjects:
-
- kind: ServiceAccount
-
name: prometheus-operator
-
namespace: monitoring
-
-
apiVersion: rbac.authorization.k8s.io/v1
-
kind: ClusterRole
-
metadata:
-
labels:
-
app.kubernetes.io/component: controller
-
app.kubernetes.io/name: prometheus-operator
-
app.kubernetes.io/version: v0.38.3
-
name: prometheus-operator
-
namespace: monitoring
-
rules:
-
- apiGroups:
-
- apiextensions.k8s.io
-
resources:
-
- customresourcedefinitions
-
verbs:
-
- create
-
- apiGroups:
-
- apiextensions.k8s.io
-
resourceNames:
-
- alertmanagers.monitoring.coreos.com
-
- podmonitors.monitoring.coreos.com
-
- prometheuses.monitoring.coreos.com
-
- prometheusrules.monitoring.coreos.com
-
- servicemonitors.monitoring.coreos.com
-
- thanosrulers.monitoring.coreos.com
-
resources:
-
- customresourcedefinitions
-
verbs:
-
- get
-
- update
-
- apiGroups:
-
- monitoring.coreos.com
-
resources:
-
- alertmanagers
-
- alertmanagers/finalizers
-
- prometheuses
-
- prometheuses/finalizers
-
- thanosrulers
-
- thanosrulers/finalizers
-
- servicemonitors
-
- podmonitors
-
- prometheusrules
-
verbs:
-
- '*'
-
- apiGroups:
-
- apps
-
resources:
-
- statefulsets
-
verbs:
-
- '*'
-
- apiGroups:
-
- ""
-
resources:
-
- configmaps
-
- secrets
-
verbs:
-
- '*'
-
- apiGroups:
-
- ""
-
resources:
-
- pods
-
verbs:
-
- list
-
- delete
-
- apiGroups:
-
- ""
-
resources:
-
- services
-
- services/finalizers
-
- endpoints
-
verbs:
-
- get
-
- create
-
- update
-
- delete
-
- apiGroups:
-
- ""
-
resources:
-
- nodes
-
verbs:
-
- list
-
- watch
-
- apiGroups:
-
- ""
-
resources:
-
- namespaces
-
verbs:
-
- get
-
- list
-
- watch
-
-
apiVersion: apps/v1
-
kind: Deployment
-
metadata:
-
labels:
-
app.kubernetes.io/component: controller
-
app.kubernetes.io/name: prometheus-operator
-
app.kubernetes.io/version: v0.38.3
-
name: prometheus-operator
-
namespace: monitoring
-
spec:
-
replicas: 1
-
selector:
-
matchLabels:
-
app.kubernetes.io/component: controller
-
app.kubernetes.io/name: prometheus-operator
-
template:
-
metadata:
-
labels:
-
app.kubernetes.io/component: controller
-
app.kubernetes.io/name: prometheus-operator
-
app.kubernetes.io/version: v0.38.3
-
spec:
-
containers:
-
- args:
-
-
-
-
-
-
-
-
-
image: quay.io/prometheus-operator/prometheus-operator:v0.38.3
-
name: prometheus-operator
-
ports:
-
- containerPort: 8080
-
name: http
-
resources:
-
limits:
-
cpu: 200m
-
memory: 200Mi
-
requests:
-
cpu: 100m
-
memory: 100Mi
-
securityContext:
-
allowPrivilegeEscalation: false
-
nodeSelector:
-
beta.kubernetes.io/os: linux
-
securityContext:
-
runAsNonRoot: true
-
runAsUser: 65534
-
serviceAccountName: prometheus-operator
-
-
apiVersion: v1
-
kind: ServiceAccount
-
metadata:
-
labels:
-
app.kubernetes.io/component: controller
-
app.kubernetes.io/name: prometheus-operator
-
app.kubernetes.io/version: v0.38.3
-
name: prometheus-operator
-
namespace: monitoring
-
-
apiVersion: v1
-
kind: Service
-
metadata:
-
labels:
-
app.kubernetes.io/component: controller
-
app.kubernetes.io/name: prometheus-operator
-
app.kubernetes.io/version: v0.38.3
-
name: prometheus-operator
-
namespace: monitoring
-
spec:
-
clusterIP: None
-
ports:
-
- name: http
-
port: 8080
-
targetPort: http
-
selector:
-
app.kubernetes.io/component: controller
-
app.kubernetes.io/name: prometheus-operator
安装 Longhorn ServiceMonitor
Longhorn ServiceMonitor 有一个标签选择器 app: longhorn-manager 来选择 Longhorn 后端服务。稍后,Prometheus CRD 可以包含 Longhorn ServiceMonitor,以便 Prometheus server 可以发现所有 Longhorn manager pods 及其端点。
-
apiVersion: monitoring.coreos.com/v1
-
kind: ServiceMonitor
-
metadata:
-
name: longhorn-prometheus-servicemonitor
-
namespace: monitoring
-
labels:
-
name: longhorn-prometheus-servicemonitor
-
spec:
-
selector:
-
matchLabels:
-
app: longhorn-manager
-
namespaceSelector:
-
matchNames:
-
- longhorn-system
-
endpoints:
-
- port: manager
安装和配置 Prometheus AlertManager
使用 3 个实例创建一个高可用的 Alertmanager 部署:
-
apiVersion: monitoring.coreos.com/v1
-
kind: Alertmanager
-
metadata:
-
name: longhorn
-
namespace: monitoring
-
spec:
-
replicas: 3
除非提供有效配置,否则 Alertmanager 实例将无法启动。有关 Alertmanager 配置的更多说明,请参见此处。下面的代码给出了一个示例配置:
-
global:
-
resolve_timeout: 5m
-
route:
-
group_by: [alertname]
-
receiver: email_and_slack
-
receivers:
-
- name: email_and_slack
-
email_configs:
-
- to: <the email address to send notifications to>
-
from: <the sender address>
-
smarthost: <the SMTP host through which emails are sent>
-
# SMTP authentication information.
-
auth_username: <the username>
-
auth_identity: <the identity>
-
auth_password: <the password>
-
headers:
-
subject: 'Longhorn-Alert'
-
text: |-
-
{{ range .Alerts }}
-
*Alert:* {{ .Annotations.summary }} - `{{ .Labels.severity }}`
-
*Description:* {{ .Annotations.description }}
-
*Details:*
-
{{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
-
{{ end }}
-
{{ end }}
-
slack_configs:
-
- api_url: <the Slack webhook URL>
-
channel: <the channel or user to send notifications to>
-
text: |-
-
{{ range .Alerts }}
-
*Alert:* {{ .Annotations.summary }} - `{{ .Labels.severity }}`
-
*Description:* {{ .Annotations.description }}
-
*Details:*
-
{{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
-
{{ end }}
-
{{ end }}
将上述 Alertmanager 配置保存在名为 alertmanager.yaml 的文件中,并使用 kubectl 从中创建一个 secret。
Alertmanager 实例要求 secret 资源命名遵循 alertmanager-{ALERTMANAGER_NAME} 格式。在上一步中,Alertmanager 的名称是 longhorn,所以 secret 名称必须是 alertmanager-longhorn
-
$ kubectl create secret generic alertmanager-longhorn
为了能够查看 Alertmanager 的 Web UI,请通过 Service 公开它。一个简单的方法是使用 NodePort 类型的 Service :
-
apiVersion: v1
-
kind: Service
-
metadata:
-
name: alertmanager-longhorn
-
namespace: monitoring
-
spec:
-
type: NodePort
-
ports:
-
- name: web
-
nodePort: 30903
-
port: 9093
-
protocol: TCP
-
targetPort: web
-
selector:
-
alertmanager: longhorn
创建上述服务后,您可以通过节点的 IP 和端口 30903 访问 Alertmanager 的 web UI。
使用上面的 NodePort 服务进行快速验证,因为它不通过 TLS 连接进行通信。您可能希望将服务类型更改为 ClusterIP,并设置一个 Ingress-controller 以通过 TLS 连接公开 Alertmanager 的 web UI。
安装和配置 Prometheus server
创建定义警报条件的 PrometheusRule 自定义资源。
-
apiVersion: monitoring.coreos.com/v1
-
kind: PrometheusRule
-
metadata:
-
labels:
-
prometheus: longhorn
-
role: alert-rules
-
name: prometheus-longhorn-rules
-
namespace: monitoring
-
spec:
-
groups:
-
- name: longhorn.rules
-
rules:
-
- alert: LonghornVolumeUsageCritical
-
annotations:
-
description: Longhorn volume {{$labels.volume}} on {{$labels.node}} is at {{$value}}% used for
-
more than 5 minutes.
-
summary: Longhorn volume capacity is over 90% used.
-
expr: 100 * (longhorn_volume_usage_bytes / longhorn_volume_capacity_bytes) > 90
-
for: 5m
-
labels:
-
issue: Longhorn volume {{$labels.volume}} usage on {{$labels.node}} is critical.
-
severity: critical
有关如何定义警报规则的更多信息,请参见https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/#alerting-rules
如果激活了 RBAC 授权,则为 Prometheus Pod 创建 ClusterRole 和 ClusterRoleBinding:
-
apiVersion: v1
-
kind: ServiceAccount
-
metadata:
-
name: prometheus
-
namespace: monitoring
-
apiVersion: rbac.authorization.k8s.io/v1beta1
-
kind: ClusterRole
-
metadata:
-
name: prometheus
-
namespace: monitoring
-
rules:
-
- apiGroups: [""]
-
resources:
-
- nodes
-
- services
-
- endpoints
-
- pods
-
verbs: ["get", "list", "watch"]
-
- apiGroups: [""]
-
resources:
-
- configmaps
-
verbs: ["get"]
-
- nonResourceURLs: ["/metrics"]
-
verbs: ["get"]
-
apiVersion: rbac.authorization.k8s.io/v1beta1
-
kind: ClusterRoleBinding
-
metadata:
-
name: prometheus
-
roleRef:
-
apiGroup: rbac.authorization.k8s.io
-
kind: ClusterRole
-
name: prometheus
-
subjects:
-
- kind: ServiceAccount
-
name: prometheus
-
namespace: monitoring
创建 Prometheus 自定义资源。请注意,我们在 spec 中选择了 Longhorn 服务监视器(service monitor)和 Longhorn 规则。
-
apiVersion: monitoring.coreos.com/v1
-
kind: Prometheus
-
metadata:
-
name: prometheus
-
namespace: monitoring
-
spec:
-
replicas: 2
-
serviceAccountName: prometheus
-
alerting:
-
alertmanagers:
-
- namespace: monitoring
-
name: alertmanager-longhorn
-
port: web
-
serviceMonitorSelector:
-
matchLabels:
-
name: longhorn-prometheus-servicemonitor
-
ruleSelector:
-
matchLabels:
-
prometheus: longhorn
-
role: alert-rules
为了能够查看 Prometheus 服务器的 web UI,请通过 Service 公开它。一个简单的方法是使用 NodePort 类型的 Service:
-
apiVersion: v1
-
kind: Service
-
metadata:
-
name: prometheus
-
namespace: monitoring
-
spec:
-
type: NodePort
-
ports:
-
- name: web
-
nodePort: 30904
-
port: 9090
-
protocol: TCP
-
targetPort: web
-
selector:
-
prometheus: prometheus
创建上述服务后,您可以通过节点的 IP 和端口 30904 访问 Prometheus server 的 web UI。
此时,您应该能够在 Prometheus server UI 的目标和规则部分看到所有 Longhorn manager targets 以及 Longhorn rules。
使用上述 NodePort service 进行快速验证,因为它不通过 TLS 连接进行通信。您可能希望将服务类型更改为 ClusterIP,并设置一个 Ingress-controller 以通过 TLS 连接公开 Prometheus server 的 web UI。
安装 Grafana
创建 Grafana 数据源配置:
-
apiVersion: v1
-
kind: ConfigMap
-
metadata:
-
name: grafana-datasources
-
namespace: monitoring
-
data:
-
prometheus.yaml: |-
-
{
-
"apiVersion": 1,
-
"datasources": [
-
{
-
"access":"proxy",
-
"editable": true,
-
"name": "prometheus",
-
"orgId": 1,
-
"type": "prometheus",
-
"url": "http://prometheus:9090",
-
"version": 1
-
}
-
]
-
}
创建 Grafana 部署:
-
apiVersion: apps/v1
-
kind: Deployment
-
metadata:
-
name: grafana
-
namespace: monitoring
-
labels:
-
app: grafana
-
spec:
-
replicas: 1
-
selector:
-
matchLabels:
-
app: grafana
-
template:
-
metadata:
-
name: grafana
-
labels:
-
app: grafana
-
spec:
-
containers:
-
- name: grafana
-
image: grafana/grafana:7.1.5
-
ports:
-
- name: grafana
-
containerPort: 3000
-
resources:
-
limits:
-
memory: "500Mi"
-
cpu: "300m"
-
requests:
-
memory: "500Mi"
-
cpu: "200m"
-
volumeMounts:
-
- mountPath: /var/lib/grafana
-
name: grafana-storage
-
- mountPath: /etc/grafana/provisioning/datasources
-
name: grafana-datasources
-
readOnly: false
-
volumes:
-
- name: grafana-storage
-
emptyDir: {}
-
- name: grafana-datasources
-
configMap:
-
defaultMode: 420
-
name: grafana-datasources
在 NodePort 32000 上暴露 Grafana:
-
apiVersion: v1
-
kind: Service
-
metadata:
-
name: grafana
-
namespace: monitoring
-
spec:
-
selector:
-
app: grafana
-
type: NodePort
-
ports:
-
- port: 3000
-
targetPort: 3000
-
nodePort: 32000
使用上述 NodePort 服务进行快速验证,因为它不通过 TLS 连接进行通信。您可能希望将服务类型更改为 ClusterIP,并设置一个 Ingress-controller 以通过 TLS 连接公开 Grafana。
使用端口 32000 上的任何节点 IP 访问 Grafana 仪表板。默认凭据为:
-
User: admin
-
Pass: admin
安装 Longhorn dashboard
进入 Grafana 后,导入预置的面板:https://grafana.com/grafana/dashboards/13032
有关如何导入 Grafana dashboard 的说明,请参阅 https://grafana.com/docs/grafana/latest/reference/export_import/
成功后,您应该会看到以下 dashboard:
将 Longhorn 指标集成到 Rancher 监控系统中
关于 Rancher 监控系统
使用 Rancher,您可以通过与领先的开源监控解决方案 Prometheus 的集成来监控集群节点、Kubernetes 组件和软件部署的状态和进程。
有关如何部署/启用 Rancher 监控系统的说明,请参见https://rancher.com/docs/rancher/v2.x/en/monitoring-alerting/
将 Longhorn 指标添加到 Rancher 监控系统
如果您使用 Rancher 来管理您的 Kubernetes 并且已经启用 Rancher 监控,您可以通过简单地部署以下 ServiceMonitor 将 Longhorn 指标添加到 Rancher 监控中:
-
apiVersion: monitoring.coreos.com/v1
-
kind: ServiceMonitor
-
metadata:
-
name: longhorn-prometheus-servicemonitor
-
namespace: longhorn-system
-
labels:
-
name: longhorn-prometheus-servicemonitor
-
spec:
-
selector:
-
matchLabels:
-
app: longhorn-manager
-
namespaceSelector:
-
matchNames:
-
- longhorn-system
-
endpoints:
-
- port: manager
创建 ServiceMonitor 后,Rancher 将自动发现所有 Longhorn 指标。
然后,您可以设置 Grafana 仪表板以进行可视化。
Longhorn 监控指标
Volume(卷)
Node(节点)
Disk(磁盘)
Instance Manager(实例管理器)
Manager(管理器)
支持 Kubelet Volume 指标
关于 Kubelet Volume 指标
Kubelet 公开了以下指标:
-
kubelet_volume_stats_capacity_bytes
-
kubelet_volume_stats_available_bytes
-
kubelet_volume_stats_used_bytes
-
kubelet_volume_stats_inodes
-
kubelet_volume_stats_inodes_free
-
kubelet_volume_stats_inodes_used
这些指标衡量与 Longhorn 块设备内的 PVC 文件系统相关的信息。
它们与 longhorn_volume_* 指标不同,后者测量特定于 Longhorn 块设备(block device)的信息。
您可以设置一个监控系统来抓取 Kubelet 指标端点以获取 PVC 的状态并设置异常事件的警报,例如 PVC 即将耗尽存储空间。
一个流行的监控设置是 prometheus-operator/kube-prometheus-stack,,它抓取 kubelet_volume_stats_* 指标并为它们提供仪表板和警报规则。
Longhorn CSI 插件支持
在 v1.1.0 中,Longhorn CSI 插件根据 CSI spec 支持 NodeGetVolumeStats RPC。
这允许 kubelet 查询 Longhorn CSI 插件以获取 PVC 的状态。
然后 kubelet 在 kubelet_volume_stats_* 指标中公开该信息。
Longhorn 警报规则示例
我们在下面提供了几个示例 Longhorn 警报规则供您参考。请参阅此处获取所有可用 Longhorn 指标的列表并构建您自己的警报规则。
-
apiVersion: monitoring.coreos.com/v1
-
kind: PrometheusRule
-
metadata:
-
labels:
-
prometheus: longhorn
-
role: alert-rules
-
name: prometheus-longhorn-rules
-
namespace: monitoring
-
spec:
-
groups:
-
- name: longhorn.rules
-
rules:
-
- alert: LonghornVolumeActualSpaceUsedWarning
-
annotations:
-
description: The actual space used by Longhorn volume {{$labels.volume}} on {{$labels.node}} is at {{$value}}% capacity for
-
more than 5 minutes.
-
summary: The actual used space of Longhorn volume is over 90% of the capacity.
-
expr: (longhorn_volume_actual_size_bytes / longhorn_volume_capacity_bytes) * 100 > 90
-
for: 5m
-
labels:
-
issue: The actual used space of Longhorn volume {{$labels.volume}} on {{$labels.node}} is high.
-
severity: warning
-
- alert: LonghornVolumeStatusCritical
-
annotations:
-
description: Longhorn volume {{$labels.volume}} on {{$labels.node}} is Fault for
-
more than 2 minutes.
-
summary: Longhorn volume {{$labels.volume}} is Fault
-
expr: longhorn_volume_robustness == 3
-
for: 5m
-
labels:
-
issue: Longhorn volume {{$labels.volume}} is Fault.
-
severity: critical
-
- alert: LonghornVolumeStatusWarning
-
annotations:
-
description: Longhorn volume {{$labels.volume}} on {{$labels.node}} is Degraded for
-
more than 5 minutes.
-
summary: Longhorn volume {{$labels.volume}} is Degraded
-
expr: longhorn_volume_robustness == 2
-
for: 5m
-
labels:
-
issue: Longhorn volume {{$labels.volume}} is Degraded.
-
severity: warning
-
- alert: LonghornNodeStorageWarning
-
annotations:
-
description: The used storage of node {{$labels.node}} is at {{$value}}% capacity for
-
more than 5 minutes.
-
summary: The used storage of node is over 70% of the capacity.
-
expr: (longhorn_node_storage_usage_bytes / longhorn_node_storage_capacity_bytes) * 100 > 70
-
for: 5m
-
labels:
-
issue: The used storage of node {{$labels.node}} is high.
-
severity: warning
-
- alert: LonghornDiskStorageWarning
-
annotations:
-
description: The used storage of disk {{$labels.disk}} on node {{$labels.node}} is at {{$value}}% capacity for
-
more than 5 minutes.
-
summary: The used storage of disk is over 70% of the capacity.
-
expr: (longhorn_disk_usage_bytes / longhorn_disk_capacity_bytes) * 100 > 70
-
for: 5m
-
labels:
-
issue: The used storage of disk {{$labels.disk}} on node {{$labels.node}} is high.
-
severity: warning
-
- alert: LonghornNodeDown
-
annotations:
-
description: There are {{$value}} Longhorn nodes which have been offline for more than 5 minutes.
-
summary: Longhorn nodes is offline
-
expr: longhorn_node_total - (count(longhorn_node_status{condition="ready"}==1) OR on() vector(0))
-
for: 5m
-
labels:
-
issue: There are {{$value}} Longhorn nodes are offline
-
severity: critical
-
- alert: LonghornIntanceManagerCPUUsageWarning
-
annotations:
-
description: Longhorn instance manager {{$labels.instance_manager}} on {{$labels.node}} has CPU Usage / CPU request is {{$value}}% for
-
more than 5 minutes.
-
summary: Longhorn instance manager {{$labels.instance_manager}} on {{$labels.node}} has CPU Usage / CPU request is over 300%.
-
expr: (longhorn_instance_manager_cpu_usage_millicpu/longhorn_instance_manager_cpu_requests_millicpu) * 100 > 300
-
for: 5m
-
labels:
-
issue: Longhorn instance manager {{$labels.instance_manager}} on {{$labels.node}} consumes 3 times the CPU request.
-
severity: warning
-
- alert: LonghornNodeCPUUsageWarning
-
annotations:
-
description: Longhorn node {{$labels.node}} has CPU Usage / CPU capacity is {{$value}}% for
-
more than 5 minutes.
-
summary: Longhorn node {{$labels.node}} experiences high CPU pressure for more than 5m.
-
expr: (longhorn_node_cpu_usage_millicpu / longhorn_node_cpu_capacity_millicpu) * 100 > 90
-
for: 5m
-
labels:
-
issue: Longhorn node {{$labels.node}} experiences high CPU pressure.
-
severity: warning
在https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/#alerting-rules 查看有关如何定义警报规则的更多信息。
原文链接:https://mp.weixin.qq.com/s/znaf4v3OBdGrLp0j23BcaQ