Operational Excellence-Monitoring and Analytics: setting up managed metrics using Prometheus and Grafana on Kubernetes with Service Mesh

Anil Gudigar

Published in

DevOps.dev

5 min readMar 20, 2024

Understanding the Components:

1. Prometheus:

Prometheus is an open-source monitoring and alerting toolkit designed for reliability and scalability. It collects metrics from configured targets, stores them efficiently, and provides a flexible querying language to analyze them. Prometheus is a cornerstone in modern monitoring setups due to its simplicity and adaptability.

Metrics Collected by promethes:

Counter : To show the number of errors or tasks completed depending on the use case.

Gauge: This can include areas such as a number of concurrent requests or how much of a CPU is being utilized over a period of time.

Histogram: This can be used when user may want to understand memory usage percent segmented by pods across a Kubernetes cluster in given points in time. The best way to do that is through a histogram

Summary: a summary samples observations in one place. It also offers a total count of observations, as well as a sum of all observed values.

2. Grafana:

Grafana is a popular open-source analytics and visualization platform that integrates with various data sources, including Prometheus. It enables users to create rich, interactive dashboards to visualize and understand complex data sets better. Grafana’s extensibility and user-friendly interface make it a favored tool for monitoring and observability tasks.

3. Kubernetes:

Kubernetes is an industry-standard platform for automating the deployment, scaling, and management of containerized applications. It provides a robust framework for orchestrating containers, enabling seamless scaling and resilience across distributed environments.

4. Service Mesh:

Service mesh technologies, such as Istio or Linkerd, offer a layer of infrastructure that provides observability, security, and control over the communication between services in a Kubernetes cluster. They facilitate features like traffic management, telemetry collection, and policy enforcement, enhancing the overall resilience and observability of microservices architectures.

Prerequisites:

Access to a Kubernetes cluster (locally or on a cloud provider).
Helm package manager installed locally.
Basic understanding of Prometheus, Grafana, Kubernetes, and service mesh concepts.
An XMatters account and API integration set up.

Setup Guide:

Step 1: Deploy Kubernetes Cluster:

Ensure you have a Kubernetes cluster up and running. You can use managed Kubernetes services like Google Kubernetes Engine (GKE), Amazon Elastic Kubernetes Service (EKS), or deploy a cluster using tools like kops or kubeadm.

Explore Your Kubernetes Cluster

# View all resources in the default namespace
kubectl get all

# View namespaces
kubectl get namespaces

# View cluster info
kubectl cluster-info

Deploy Applications

# Deploy sample application
kubectl create deployment demo --image=gcr.io/google-samples/hello-app:1.0

# Expose the deployment as a service
kubectl expose deployment demo --type=NodePort --port=8080

Access Your Application

# Get the NodePort of the service
kubectl get svc

# Access the application via browser or curl
# For Minikube
minikube service demo

# For Kind
kubectl port-forward service/demo <local-port>:8080

Step 2: Install Service Mesh:

Choose and install your preferred service mesh (e.g., Istio, Linkerd). Follow the respective installation guides to set up the service mesh data plane and control plane components within your Kubernetes cluster.

Install Istio as the service mesh:

istioctl install --set profile=demo

Step 3: Deploy Prometheus:

Deploy Prometheus into your Kubernetes cluster using Helm charts or YAML manifests. Ensure Prometheus is configured to scrape relevant metrics endpoints from your services and applications. Integrate Prometheus with your service mesh for enhanced observability.

# Add Prometheus Helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

# Update Helm repositories
helm repo update

# Install Prometheus using Helm
helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring

Step 4: Deploy Grafana:

Deploy Grafana into your Kubernetes cluster, either using Helm charts or YAML manifests. Configure Grafana to connect to Prometheus as a data source. Set up dashboards and alerts tailored to your monitoring requirements.

# Install Grafana using Helm
helm install grafana grafana/grafana --namespace monitoring

Configure Data Sources and Dashboards

Access Grafana dashboard using the service URL (kubectl get svc -n monitoring).
Log in to Grafana using default credentials (admin/admin).
Navigate to Configuration > Data Sources > Add data source.
Select Prometheus.
Set URL to http://prometheus-server.monitoring.svc.cluster.local.
Click Save & Test.
Import pre-built dashboards from Grafana’s dashboard repository.

Step 5: Configure Service Mesh Telemetry:

Enable telemetry features within your service mesh to capture metrics related to service-to-service communication, traffic flows, and latency. Configure Prometheus to scrape these telemetry endpoints for comprehensive monitoring.

Configure Prometheus to Scrape Metrics from Istio

Create a Prometheus configuration file (prometheus-istio.yaml):

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'istio-mesh'
    kubernetes_sd_configs:
      - role: endpoints
    relabel_configs:
      - source_labels: [__meta_kubernetes_service_label_istio_io_mesh]
        action: keep
        regex: true

Apply the configuration:

kubectl create configmap prometheus-istio-config --from-file=prometheus-istio.yaml -n monitoring

Step 6: Visualize and Analyze:

Access Grafana’s web interface and start building dashboards to visualize metrics collected by Prometheus. Utilize Grafana’s rich set of visualization options and plugins to create insightful representations of your system’s performance.

Step 7: Set Up Alerts:

Define alerting rules within Prometheus to notify you of any abnormal behavior or performance degradation. Integrate Prometheus alerts with external notification services like Slack or email for timely responses to incidents.

Define alerting rules in Prometheus configuration for Istio metrics.
Configure XMatters integration in Prometheus for sending alerts.

Step 8: Continuous Optimization:

Regularly review and optimize your monitoring setup based on evolving requirements and system dynamics. Fine-tune alerting thresholds, add new metrics, and adjust dashboard visualizations to ensure your monitoring remains effective and relevant.

Generate test alerts in Prometheus to ensure they are being triggered.
Monitor XMatters for incoming alerts and ensure they are correctly formatted and actionable.