• Welcome to the world's largest Chinese hacker forum

    Welcome to the world's largest Chinese hacker forum, our forum registration is open! You can now register for technical communication with us, this is a free and open to the world of the BBS, we founded the purpose for the study of network security, please don't release business of black/grey, or on the BBS posts, to seek help hacker if violations, we will permanently frozen your IP and account, thank you for your cooperation. Hacker attack and defense cracking or network Security

    business please click here: Creation Security  From CNHACKTEAM

K8S-kubernetes结合警告管理器实现报警通知及基于haproxy_exporter监控负载均衡器


Recommended Posts

普罗米修斯系统架构图

ynphfajkmzq6076.png

普罗米修斯触发一条告警的过程:

普罗米修斯-触发阈值-超出持续时间-警报管理器-分组|抑制|静默-媒体类型-邮件|钉钉|微信等。

分组(集团):将类似性质的警报合并为单个通知。

静默(沉默):是一种简单的特定时间静音的机制,例如:服务器要升级维护可以先设置这个时间段告警静默。

抑制(抑制):当警报发出后,停止重复发送由此警报引发的其他警报即合并一个故障引起的多个报警事件,可以消除冗余告警。

g1v3fubo1dg6077.png

下载并报警组件警告管理器

# pwd

/usr/local/src

# tar xvf alertmanager-0.18.0.linux-amd64.tar.gz

# ln-SV/usr/local/src/alert manager-0。18 .0 .Linux-amd64/usr/本地/警报管理器

# cd /usr/local/alertmanager

配置警告管理器

官方配置文档:https://普罗米修斯。io/文档/警报/配置/

# pwd

/usr/local/alertmanager

# cat alertmanager.yml

全球:

resolve_timeout: 5m

SMTP _智能主机: ' SMTP。QQ。com :465 '

SMTP _ from : ' 2973707860 @ QQ。' com '

SMTP _ auth _ username : ' 2973707860 @ QQ。' com '

SMTP _ auth _ password : ' udwthyyxstcdhcj '

' SMTP _ hello : '@qq.com '

smtp_require_tls: false

route: #路线用来设置报警的分发策略

group_by: ['alertname'] #采用哪个标签来作为分组依据

group_wait: 10s #组告警等待时间。也就是告警产生后等待10s,如果有同组告警一起发出

group_interval: 10s #两组告警的间隔时间

repeat_interval: 2m #重复告警的间隔时间,减少相同邮件的发送频率

receiver: 'web.hook' #设置接收人

接收器:

- name: 'web.hook '

#webhook_configs:

#- url: 'http://127.0.0,5001/'

email_configs:

-去'[email protected] :号

inhibit_rules: #禁止的规则

- source_match: #源匹配级别

严重性: '关键'

target_match:

严重性: '警告'

equal: ['alertname ',' dev ',' instance']

启动警告管理器服务

二进制启动。/alertmanager - config.file=./alertmanager.yml

启动脚本

[单位]

描述=普罗米修斯服务器

文档=https://普罗米修斯。io/文档/简介/概述/

After=network.target

[服务]

重启=开-失败

ExecStart=/usr/local/alert manager/alert manager-config。file=/usr/local/alert manager/alert manager。阳明海运股份有限公司

[安装]

WantedBy=多用户。目标

验证警告管理器的9093端口已经监听

lsof -i:9093

命令PID用户软驱类型设备大小/关闭节点

NAME alertmana 127083 root 6u IPv6 8581566 0t0 TCP *:9093 (LISTEN)

alertmanager dashboard截图

jjr0nxfqvpk6078.png

配置prometheus报警规则

# cd /usr/local/prometheus
# vim prometheus.yml
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- 192.168.7.102:9093 #alertmanager地址
# Load rules once and periodically evaluate them according to the global
'evaluation_interval'.
rule_files:
- "/usr/local/prometheus/rule-linux36.yml" #指定规则文件
# - "second_rules.yml"

 

创建报警规则文件

# pwd
/usr/local/prometheus
# cat /usr/local/prometheus/rule-linux36.yml groups: - name: linux37_pod.rules rules: - alert: Pod_all_cpu_usage expr: (sum by(name)(rate(container_cpu_usage_seconds_total{image!=""}[5m]))*100) > 10 for: 5m labels: severity: critical service: pods annotations: description: 容器 {{ $labels.name }} CPU 资源利用率大于 75% , (current value is {{ $value }}) summary: Dev CPU 负载告警 - alert: Pod_all_memory_usage expr: sort_desc(avg by(name)(irate(container_memory_usage_bytes{name!=""} [5m]))*100) > 1024*10^3*2 for: 10m labels: severity: critical annotations: description: 容器 {{ $labels.name }} Memory 资源利用率大于 2G , (current value is {{ $value }}) summary: Dev Memory 负载告警 - alert: Pod_all_network_receive_usage expr: sum by (name) (irate(container_network_receive_bytes_total{container_name="POD"}[1m])) > 1024*1024*50 for: 10m labels: severity: critical annotations: description: 容器 {{ $labels.name }} network_receive 资源利用率大于 50M , (current value is {{ $value }})

 

报警规则验证

# pwd
/usr/local/prometheus
#验证报警规则设置:
# ./promtool check rules rule-linux36.yml #监测rule规则文件是否正确
Checking rule-linux36.yml
SUCCESS: 3 rules found

重启prometheus

systemctl restart prometheus

验证报警规则匹配

# pwd
/usr/local/alertmanager
# ./amtool alert --alertmanager.url=http://192.168.7.102:9093
Alertname Starts At Summary
Pod_all_cpu_usage 2019-08-07 07:39:04 CST Dev CPU 负载告警

prometheus首页状态

f1uhab1bhkq6079.png

 

 prometheus web界面验证报警规则

   status-rules

tdzxzrf2xlt6080.png

 

 验证收到的报警邮件

qkbmfd3q5cx6081.png

 

 prometheus监控haproxy

部署haproxy_exporter

# pwd
/usr/local/src
# tar xvf haproxy_exporter-0.9.0.linux-amd64.tar.gz
# ln -sv /usr/local/src/haproxy_exporter-0.9.0.linux-amd64 /usr/local/haproxy_exporter
# cd /usr/local/haproxy_exporter
# ./haproxy_exporter --haproxy.scrape-uri=unix:/run/haproxy/admin.sock
# ./haproxy_exporter --haproxy.scrape-
uri="http://haadmin:[email protected]:9999/haproxy-status;csv" &

验证web界面数据

lrumdyvshcl6082.png

prometheus server端添加haproxy数据采集

# vim /usr/local/prometheus/prometheus.yml
# cd /usr/local/prometheus/
# grep -v "#" prometheus.yml | grep -v "^$"
global:
alerting:
alertmanagers:
- static_configs:
- targets: ["192.168.7.102:9093"]
rule_files:
- "/usr/local/prometheus/rule-linux36.yml"
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'promethues-node'
static_configs:
- targets: ['192.168.7.110:9100','192.168.7.111:9100']
- job_name: 'prometheus-containers'
static_configs:
- targets: ["192.168.7.110:8080","192.168.7.111:8080"]
- job_name: 'prometheus-haproxy'
static_configs:
- targets: ["192.168.7.108:9101"]

重启prometheus

systemctl restart prometheus

grafana添加模板

367 2428

pdnkfqybtdx6083.png

 

 验证haproxy监控数据

jmg2l5f2m4n6084.png

 

Link to comment
Share on other sites