Xây dựng một Target Endpoint cho Prometheus

Giới thiệu

Prometheus_client là một thư viện Python được sử dụng để thu thập và xuất dữ liệu theo định dạng Prometheus. Prometheus là một hệ thống giám sát và cảnh báo mã nguồn mở được sử dụng để thu thập dữ liệu thời gian thực từ các ứng dụng và dịch vụ. Prometheus_client cho phép các ứng dụng Python dễ dàng xuất dữ liệu cho Prometheus thu thập.

Các tính năng chính

Thu thập nhiều loại dữ liệu khác nhau, bao gồm số liệu phản hồi, bộ đếm, đồng hồ đo và tóm tắt
Hỗ trợ định dạng văn bản và JSON cho dữ liệu xuất ra
Cho phép tùy chỉnh định dạng và tên của các số liệu
Hỗ trợ nhóm và nhãn cho các số liệu
Dễ dàng tích hợp với các thư viện và khung khác

Cài đặt.

Hãy cài đặt Pip trước, đây là là một công cụ quản lý gói Python, cho phép bạn cài đặt và quản lý các thư viện và phụ thuộc Python.

Để cài đặt pip trên Ubuntu, bạn có thể sử dụng lệnh sau:

sudo apt update
sudo apt install python3-pip

Sau khi cài đặt pip, bạn có thể cài đặt prometheus_client bằng cách chạy:

pip3 install prometheus_client

Nếu bạn đang sử dụng một hệ điều hành khác, quá trình cài đặt pip có thể khác nhau.

Dưới đây là một ví dụ thực tế ứng dụng module prometheus_client vào môi trường thực tế.

Để tạo một exporter cho Prometheus, bạn cần tạo một HTTP server mà Prometheus có thể pull dữ liệu từ đó.

Dưới đây là đoạn code ví dụ.

import subprocess, re, socket, time
from prometheus_client import start_http_server, Gauge

instance = socket.gethostbyname(socket.gethostname())
node = socket.gethostname()

# Define your metrics
cpu_load_average_1minute = Gauge('pe_custom_node_cpu_load_average_1minute', 'Description of gauge', ['id', 'instance'])
cpu_load_average_5minute = Gauge('pve_custom_node_cpu_load_average_5minute', 'Description of gauge', ['id', 'instance'])
cpu_load_average_15minute = Gauge('pve_custom_node_cpu_load_average_15minute', 'Description of gauge', ['id', 'instance'])
current_logged_in_users = Gauge('pve_custom_current_logged_in_users', 'Description of gauge', ['id', 'instance'])

def cpu_load_average(node, instance):
    results = subprocess.check_output("w", shell=True, text=True).strip('\n').split('\n')
    match = re.search(r'(\d+) users', results[0])

    if match:
        users_count = match.group(1)
    else:
        users_count = 0

    load_averages = re.findall(r'\d+\.\d+', results[0])

    cpu_load_average_1minute.labels(id="node/%s" % node, instance=instance).set(load_averages[0])
    cpu_load_average_5minute.labels(id="node/%s" % node, instance=instance).set(load_averages[1])
    cpu_load_average_15minute.labels(id="node/%s" % node, instance=instance).set(load_averages[2])
    current_logged_in_users.labels(id="node/%s" % node, instance=instance).set(users_count)

if __name__ == '__main__':
    # Start up the server to expose the metrics.
    start_http_server(16490)
    # Generate some requests.
    while True:
        cpu_load_average(node, instance)
        time.sleep(1)

Trong đoạn code này, nếu bạn sử dựng thì bạn cần thay thế "your_node" và "your_instance" bằng giá trị thực tế của bạn.

Sau khi bạn chạy đoạn code trên, bạn có thể thêm một target mới vào Prometheus với URL http://<your_server_ip>:8000/metrics, với <your_server_ip> là địa chỉ IP của máy chủ chạy đoạn code trên.

Để khai báo endpoint này trong Prometheus, bạn cần thêm một target mới vào file cấu hình Prometheus (prometheus.yml).

scrape_configs:
  - job_name: 'demo-endpoints'
    static_configs:
      - targets: ['192.168.100.201:16490']

Nếu sử dụng Netstat bạn sẽ thấy port endpoint đã listen.

shell> netstat -tlnp  | grep 16490
tcp        0      0 0.0.0.0:16490           0.0.0.0:*               LISTEN      1191446/python3

Dưới đây là kết quả trên Target Prometheus.

Kết quả khi curl vào endpoint.

shell> curl http://192.168.100.201:16490/metrics
# HELP python_gc_objects_collected_total Objects collected during gc
# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 408.0
python_gc_objects_collected_total{generation="1"} 207.0
python_gc_objects_collected_total{generation="2"} 0.0
# HELP python_gc_objects_uncollectable_total Uncollectable objects found during GC
# TYPE python_gc_objects_uncollectable_total counter
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0
# HELP python_gc_collections_total Number of times this generation was collected
# TYPE python_gc_collections_total counter
python_gc_collections_total{generation="0"} 39.0
python_gc_collections_total{generation="1"} 3.0
python_gc_collections_total{generation="2"} 0.0
# HELP python_info Python platform information
# TYPE python_info gauge
python_info{implementation="CPython",major="3",minor="11",patchlevel="2",version="3.11.2"} 1.0
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 1.82583296e+08
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 2.3199744e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.71478961504e+09
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0.13
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 11.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.048576e+06
# HELP pe_custom_node_cpu_load_average_1minute Description of gauge
# TYPE pe_custom_node_cpu_load_average_1minute gauge
pe_custom_node_cpu_load_average_1minute{id="node/pve01",instance="10.10.10.1"} 1.74
# HELP pve_custom_node_cpu_load_average_5minute Description of gauge
# TYPE pve_custom_node_cpu_load_average_5minute gauge
pve_custom_node_cpu_load_average_5minute{id="node/pve01",instance="10.10.10.1"} 1.66
# HELP pve_custom_node_cpu_load_average_15minute Description of gauge
# TYPE pve_custom_node_cpu_load_average_15minute gauge
pve_custom_node_cpu_load_average_15minute{id="node/pve01",instance="10.10.10.1"} 1.61
# HELP pve_custom_current_logged_in_users Description of gauge
# TYPE pve_custom_current_logged_in_users gauge
pve_custom_current_logged_in_users{id="node/pve01",instance="10.10.10.1"} 8.0

Sau khi bạn thêm target mới, bạn cần khởi động lại Prometheus để nó cập nhật cấu hình mới và dưới đây là kết quả truy vấn.

Lợi ích

Sử dụng Prometheus_client có một số lợi ích, bao gồm:

Dễ dàng thu thập và xuất dữ liệu
Dữ liệu được định dạng theo tiêu chuẩn Prometheus
Cho phép giám sát và cảnh báo hiệu quả
Có thể tích hợp với các hệ thống giám sát khác

Lời khuyên

Sử dụng tên và nhãn mô tả cho các số liệu của bạn để dễ dàng xác định chúng.
Nhóm các số liệu liên quan để dễ dàng tổ chức dữ liệu của bạn.
Sử dụng Prometheus để giám sát hiệu suất ứng dụng của bạn và nhận thông báo về các vấn đề tiềm ẩn.

Kết luận

Prometheus_client là một công cụ mạnh mẽ để thu thập và xuất dữ liệu cho Prometheus. Nó dễ sử dụng và có thể giúp bạn giám sát hiệu suất ứng dụng của mình một cách hiệu quả.

Ngoài ra

Bạn có thể tìm hiểu thêm về Prometheus_client tại https://prometheus.io/docs/instrumenting/clientlibs/
Bạn có thể tìm hiểu thêm về Prometheus tại https://prometheus.io/docs/introduction/overview/

Hy vọng bài viết này hữu ích cho bạn!

Xây dựng một Target Endpoint cho Prometheus

Bài viết gần đây

Tính năng mới của ChatGPT với GPT-4o

Hiểu rõ các mô hình ChatGPT: GPT-4o, o3, 04-mini, 04-mini-high là gì?

Cấu hình static IP trên Rocky Linux 9

Kết hợp RAID phần cứng và phần mềm (ZFS)

So sánh ZFS, LDISKFS và mdadm khi triển khai Lustre

Related Stories

Leave A Reply Cancel reply

Đăng ký nhận thông tin bài viết qua email