How to Monitor Your Homelab with Grafana and Prometheus

You've built your homelab. Maybe it's a single Proxmox node running a handful of containers, or maybe you've gone full rack mode with multiple servers humming away in a closet. Either way, there's one question that eventually hits every homelabber: "What's actually going on with my stuff?"

That's where monitoring comes in. And when it comes to monitoring in 2024, the combo of Prometheus and Grafana is basically the gold standard. It's what the pros use, it's free, and once you set it up, you'll wonder how you ever lived without those pretty graphs.

Let's get you set up.

Why Bother Monitoring?

Before we dive into the how, let's talk about the why. Monitoring your homelab gives you:

Early warning signs - Catch a failing drive or runaway process before it takes down your Plex server during movie night
Historical data - See trends over time. Is your RAM usage slowly creeping up? When did that start?
Capacity planning - Know when you actually need to upgrade versus when you're just being paranoid
The cool factor - Let's be honest, dashboards look awesome on that spare monitor

The Stack: What We're Building

Here's what each piece does:

Prometheus - The data collector. It scrapes metrics from your services at regular intervals and stores them in a time-series database. Think of it as the warehouse where all your numbers live.
Grafana - The visualization layer. It connects to Prometheus and turns those numbers into beautiful, interactive dashboards.
Node Exporter - A small agent that exposes system metrics (CPU, memory, disk, network) in a format Prometheus understands.
cAdvisor - Container metrics. If you're running Docker, this tells you what each container is doing.

The Docker Compose Setup

Here's a complete, working docker-compose.yml that you can drop into your homelab and run today. Create a new directory for your monitoring stack:

mkdir ~/monitoring && cd ~/monitoring

Create your docker-compose.yml:

version: '3.8'

services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    restart: unless-stopped
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=30d'
      - '--web.enable-lifecycle'
    ports:
      - "9090:9090"
    networks:
      - monitoring

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    restart: unless-stopped
    volumes:
      - grafana_data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=changeme
      - GF_USERS_ALLOW_SIGN_UP=false
    ports:
      - "3000:3000"
    networks:
      - monitoring
    depends_on:
      - prometheus

  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    restart: unless-stopped
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'
      - '--path.rootfs=/rootfs'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
    ports:
      - "9100:9100"
    networks:
      - monitoring

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    container_name: cadvisor
    restart: unless-stopped
    privileged: true
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /dev/disk/:/dev/disk:ro
    ports:
      - "8080:8080"
    networks:
      - monitoring

networks:
  monitoring:
    driver: bridge

volumes:
  prometheus_data:
  grafana_data:

Now create the Prometheus configuration. Make a directory and config file:

mkdir prometheus

Create prometheus/prometheus.yml:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node-exporter:9100']

  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']

  # Add more hosts here as your homelab grows
  # - job_name: 'other-server'
  #   static_configs:
  #     - targets: ['192.168.1.50:9100']

Fire it up:

docker compose up -d

Give it a minute to start, then hit http://your-server-ip:3000 and log in with admin/changeme (change this immediately, obviously).

Connecting Grafana to Prometheus

Once you're logged into Grafana:

Go to Connections > Data sources
Click Add data source
Select Prometheus
For the URL, enter: http://prometheus:9090
Scroll down and click Save & test

You should see a green "Successfully queried the Prometheus API" message. If you don't, double-check that both containers are on the same Docker network.

What Metrics Should You Track?

With Node Exporter and cAdvisor running, you've got access to hundreds of metrics. Here are the ones that actually matter for a homelab:

System Health Basics

CPU Usage - Both overall and per-core. Helps identify if a single core is getting hammered.
Memory Usage - Total, used, cached, and available. Linux loves to use RAM for cache, so "used" can be misleading.
Disk Space - Nothing kills a server faster than a full disk. Set alerts at 80% and 90%.
Disk I/O - Read/write speeds and IOPS. Useful for spotting bottlenecks.
Network Traffic - Bytes in/out per interface. Great for seeing which service is hogging your bandwidth.

Container Metrics

Container CPU - Which containers are working hardest?
Container Memory - Spot memory leaks before they become problems.
Container Network - See exactly how much traffic each container generates.
Container Restarts - A container that keeps restarting is trying to tell you something.

Temperature (If Available)

Node Exporter can expose CPU and other hardware temps if your system supports it. Worth monitoring if your homelab lives somewhere warm or you're pushing overclocked hardware.

Setting Up Your First Dashboard

You could build dashboards from scratch, but why would you? The Grafana community has already done the hard work. Here are some excellent pre-built dashboards you can import in seconds:

Node Exporter Full (Dashboard ID: 1860)

This is the classic. It gives you everything about your host system in one comprehensive view. To import it:

Go to Dashboards > New > Import
Enter 1860 in the "Import via grafana.com" field
Click Load
Select your Prometheus data source
Click Import

Boom. Instant professional-looking dashboard with CPU, memory, disk, network, and a ton more.

Docker and System Monitoring (Dashboard ID: 893)

Great for container-focused monitoring. Shows resource usage per container alongside system metrics.

cAdvisor Exporter (Dashboard ID: 14282)

A clean, modern dashboard focused specifically on container metrics from cAdvisor.

Building a Custom Overview Dashboard

Imported dashboards are great, but you'll probably want a custom "at a glance" dashboard for your specific setup. Here are some useful PromQL queries to get you started:

CPU Usage Percentage

100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

Memory Usage Percentage

(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

Disk Usage Percentage (Root Filesystem)

100 - ((node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100)

Network Traffic (Received, per second)

irate(node_network_receive_bytes_total{device="eth0"}[5m])

Container Memory Usage

container_memory_usage_bytes{name!=""}

Container CPU Usage

rate(container_cpu_usage_seconds_total{name!=""}[5m]) * 100

Setting Up Alerts

Pretty graphs are nice, but alerts are where monitoring becomes actually useful. Grafana can send notifications to email, Discord, Slack, Telegram, and dozens of other services when things go wrong.

Some alerts every homelabber should have:

Disk space above 85% - Gives you time to clean up or add storage
Memory above 90% for 5+ minutes - Might indicate a memory leak
Container restart count increasing - Something's crashlooping
Host unreachable - If you're monitoring multiple machines
High CPU for extended periods - Could be crypto mining malware or a runaway process

To set up alerts in Grafana, go to Alerting > Alert rules and create rules based on your PromQL queries. Then configure a contact point (Discord webhook, email, etc.) to receive notifications.

Monitoring Multiple Hosts

Got more than one server? No problem. Just install Node Exporter on each machine and add them to your Prometheus config:

  - job_name: 'proxmox-node'
    static_configs:
      - targets: ['192.168.1.10:9100']
        labels:
          instance: 'proxmox'

  - job_name: 'nas'
    static_configs:
      - targets: ['192.168.1.20:9100']
        labels:
          instance: 'synology'

After updating the config, tell Prometheus to reload:

curl -X POST http://localhost:9090/-/reload

Tips for Long-Term Success

Set reasonable retention - The default 30 days is fine for most homelabs. Going longer eats disk space fast.
Don't over-monitor - You don't need 1-second scrape intervals. 15-30 seconds is plenty for homelab use.
Backup your Grafana - Those dashboard volumes contain your hard work. Include them in your backup strategy.
Use labels wisely - Good labeling makes filtering and grouping in Grafana much easier.
Start simple - Import one dashboard, get comfortable, then expand. You don't need to monitor everything on day one.

Wrapping Up

Monitoring might seem like overkill for a homelab, but trust me, the first time you catch a problem before it becomes a disaster, you'll be glad you set this up. Plus, there's something deeply satisfying about watching those graphs tick along, knowing exactly what your hardware is doing at any moment.

The Prometheus and Grafana combo gives you enterprise-grade monitoring for free. It scales from a single Raspberry Pi to a full rack of servers. And once you get the basics running, there's a whole world of exporters for specific applications: databases, web servers, smart home devices, and pretty much anything else you can think of.

Now go forth and graph all the things.