Prometheus监控报警

Posted by zengchengjie on Friday, April 15, 2022

参考教程prometheus-book

架构

使用和部署之前,我们需要了解prometheus的架构,方便后续开展使用和运维

贴一张官网架构图如下:

  • Prometheus server:处理采集上来的数据

  • Node exporter:采集主机数据

  • Grafana:数据可视化

  • Promql:prometheus提供的数据库查询语句

  • TSDB:存储的数据库

部署

部署方式:可以使用二进制文件部署和docker部署两种方式,本文使用docker部署。

二进制文件部署

对于非Docker用户,可以从 https://prometheus.io/download/ 找到最新版本的Prometheus Sevrer软件包:

export VERSION=2.4.3
curl -LO  https://github.com/prometheus/prometheus/releases/download/v$VERSION/prometheus-$VERSION.darwin-amd64.tar.gz

解压,并将Prometheus相关的命令,添加到系统环境变量路径即可:

tar -xzf prometheus-${VERSION}.darwin-amd64.tar.gz
cd prometheus-${VERSION}.darwin-amd64

解压后当前目录会包含默认的Prometheus配置文件promethes.yml:

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

Promtheus作为一个时间序列数据库,其采集的数据会以文件的形式存储在本地中,默认的存储路径为data/(若使用默认路径跳过此步骤即可),因此我们需要先手动创建该目录:

mkdir -p data

用户也可以通过参数--storage.tsdb.path="data/"修改本地数据存储的路径。

启动prometheus服务,其会默认加载当前路径下的prometheus.yaml文件:

./prometheus

使用docker部署

  • 创建配置文件

    prometheus.yml文件内容(官方示例):

    # my global config
    global:
      scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
      evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
      # scrape_timeout is set to the global default (10s).
    
    # Alertmanager configuration
    #alerting:
    #  alertmanagers:
    #    - static_configs:
    #        - targets:
              # - alertmanager:9093
    
    # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
    rule_files:
      # - "first_rules.yml"
      # - "second_rules.yml"
    
    # A scrape configuration containing exactly one endpoint to scrape:
    # Here it's Prometheus itself.
    scrape_configs:
      # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
      - job_name: "prometheus"
    
        # metrics_path defaults to '/metrics'
        # scheme defaults to 'http'.
    
        static_configs:
          - targets: ["localhost:9090"]
    
  • docker run:

docker run -p 9090:9090 -v /home/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus

或者创建docker-compose.yml文件:

version: '3.3'
services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    restart: always
    privileged: true
    user: root
    ports:
      - 9090:9090
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml

从节点部署:

version: '3.7'
services:
  node-exporter:
    image: prom/node-exporter:latest
    ports:
      - "9100:9100"

完整的Prometheus、grafana、nodeexporter部署

version: '3.7'
services:
  node-exporter:
    image: prom/node-exporter:latest
    ports:
      - "9100:9100"
    networks:
      - prom

  prometheus:
    image: prom/prometheus:latest
    volumes:
      - type: bind
        source: ./prometheus/prometheus.yml
        target: /etc/prometheus/prometheus.yml
        read_only: true
      - type: volume
        source: prometheus
        target: /prometheus
    ports:
      - "9090:9090"
    networks:
      - prom

  grafana:
    depends_on:
      - prometheus
    image: grafana/grafana:latest
    volumes:
      - type: volume
        source: grafana
        target: /var/lib/grafana
    ports:
      - "3000:3000"
    networks:
      - prom

volumes:
  prometheus:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /opt/dmgeo/prom/prometheus/data
  grafana:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /opt/dmgeo/prom/grafana

networks:
  prom:
    driver: bridge

部署consul

使用consul部署,需要从终端发起一个注册请求,将服务注册到consul,请求示例如下:

注册

curl -X PUT   --header 'X-Consul-Token: f29e2e07-62f1-4f99-ad68-9ed21e268e15' -d '{"id": "node-exporter-192.168.153.118","name": "node-exporter","address": "192.168.153.118","port": 9100,"tags": ["@999@"],"checks": [{"http": "http://192.168.153.118:9100/metrics", "interval": "5s"}]}'  http://192.168.179.58:8500/v1/agent/service/register

卸载

curl -X PUT --header 'X-Consul-Token: f29e2e07-62f1-4f99-ad68-9ed21e268e15'  http://192.168.179.58:8500/v1/agent/service/deregister/node-exporter-192.168.153.118

基于文件发现

部署node Exporter

curl -X PUT -d '{"id": "node-exporter","name": "node-exporter-192.168.10.44","address": "192.168.10.44","port": 9100,"tags": ["test"],"checks": [{"http": "http://192.168.10.44:9100/metrics", "interval": "5s"}]}'  http://192.168.10.44:8500/v1/agent/service/register

部署grafana

grafana官方文档

默认用户名和密码为admin、admin

部署AlertManager

将alertmanager报警数据转发到某个接口

将Prometheus数据转发到别的系统

使用remote_write标签,将数据转发到目标接口,这个标签可以过滤指定的指标数据,比如使用regex标签指定只传输node_network_transmit_bytes_total这个指标

remote_write:
  - url: "http://192.168.10.49:8099/receive"
    write_relabel_configs:
    - source_labels: [__name__]
      regex: "node_network_transmit_bytes_total"
      action: keep

和springboot整合

prometheus 官方提供了spring boot 的依赖,但是该客户端已经不支持spring boot 2

<dependency>
    <groupId>io.prometheus</groupId>
    <artifactId>simpleclient_spring_boot</artifactId>
    <version>0.4.0</version>
</dependency>

由于 spring boot 2 的actuator 使用了 Micrometer 进行监控数据统计, 而Micrometer 提供了prometheus 支持,我们可以使用 micrometer-registry-prometheus 来集成 spring boot 2 加入相应依赖

 <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-actuator</artifactId>
  </dependency>
 <dependency>
        <groupId>io.micrometer</groupId>
         <artifactId>micrometer-core</artifactId>
 </dependency>
 <dependency>
        <groupId>io.micrometer</groupId>
        <artifactId>micrometer-registry-prometheus</artifactId>
 </dependency>