使用Grafana Agent收集Pod的cpu和内存指标，以及标准输出日志的完整案例

前面两篇：如何获取Pod标准输出(stdout)日志，使用Grafana Agent收集，最后上传到Grafana Loki、如何获取Pod的CPU和内存指标，使用Grafana Agent收集指标，上传到Prometheus，其实是我按知识点拆分为两篇文章写的。

但通常这两者是一起的，可观测即离不开指标，也离不开日记。当两者都需要的时候，就没必要部署两个DaemonSet了。本篇将两者结合成一个完整的案例，大家可以直接拿去部署使用。

首先是DaemonSet：

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: grafana-agent
spec:
  selector:
    matchLabels:
      app: grafana-agent
  template:
    metadata:
      labels:
        app: grafana-agent
    spec:
      serviceAccountName: grafana-agent
      containers:
      - name: grafana-agent
        image: grafana/agent:latest
        command:
          - grafana-agent
          - -config.file=/etc/agent/agent.yaml
          - -metrics.wal-directory=/tmp/metrics
          - -config.expand-env # 使环境变量能够替换配置文件中的占位符'${xxx}'
        env:
        - name: HOSTNAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: METRICS_PORT
          value: '10255'
        - name: NODE_INTRANET_IP
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        volumeMounts:
          - name: config-volume
            mountPath: /etc/agent
          - name: pod-logs-volume
            mountPath: /var/log/pods
          - name: store-otherdata-volume
            mountPath: /tmp
      volumes:
      - name: config-volume
        configMap:
          name: grafana-agent-config
      - name: pod-logs-volume
        hostPath:
          path: /var/log/pods
      - name: store-otherdata-volume
        hostPath:
          path: /tmp

其次是rbac授权：

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: grafana-agent
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: grafana-agent
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - nodes/proxy
  - nodes/metrics
  - services
  - endpoints
  - pods
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: grafana-agent
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: grafana-agent
subjects:
- kind: ServiceAccount
  name: grafana-agent
  namespace: default

最后是配置文件：

apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-agent-config
data:
  agent.yaml: |
    server:
      log_level: debug
    metrics:
      global:
        scrape_interval: 1m
      configs:
        - name: agent
          scrape_configs:
          - job_name: kubelet-metrics-resource
            metrics_path: /metrics/resource
            static_configs:
              - targets: 
                - "${NODE_INTRANET_IP}:${METRICS_PORT}"
          remote_write:
            - url: https://<你的prometheus的域名>/api/prom/push
              basic_auth:
                username: <用户名>
                password: <密码>
    logs:
      configs:
      - name: default
        clients:
        - url: https://<用户名>:<密码>@<你的Loki的域名>/loki/api/v1/push
        positions: 
          filename: /tmp/logs/positions.yaml
        scrape_configs:
        - job_name: kubernetes-pods
          pipeline_stages:
          - cri: {}
          kubernetes_sd_configs:
          - role: pod
          relabel_configs:
          - source_labels:
            - __meta_kubernetes_pod_node_name
            target_label: __host__
          - action: replace
            source_labels:
            - __meta_kubernetes_namespace
            target_label: namespace
          - action: replace
            source_labels:
            - __meta_kubernetes_pod_name
            target_label: pod
          - action: replace
            source_labels:
            - __meta_kubernetes_pod_container_name
            target_label: container
          - action: replace
            source_labels:
            - __meta_kubernetes_pod_host_ip
            target_label: host_ip
          - action: replace
            target_label: __path__
            source_labels:
            - __meta_kubernetes_pod_uid
            - __meta_kubernetes_pod_container_name
            separator: /
            replacement: "/var/log/pods/*$1/*.log"

关于每个配置项的解释，看前面两篇文章就好了。

使用Grafana Agent收集Pod的cpu和内存指标，以及标准输出日志的完整案例

文章推荐