30 KiB
30 KiB
PVC Deep Dive Guide: Understanding Persistent Storage in Kubernetes
🎯 Overview
This guide explains Persistent Volume Claims (PVCs) in detail, why they're essential, and how your current Kubernetes setup uses them. PVCs are crucial for applications that need to store data that survives pod restarts, crashes, or migrations.
📊 How PVCs Work: Visual Explanation
🔄 PVC Lifecycle Flow
┌─────────────────────────────────────────────────────────────────────────────┐
│ PVC LIFECYCLE │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ DEVELOPER │ │ PVC │ │ PV │ │ STORAGE │ │
│ │ Creates │ │ Requests │ │ Provides │ │ Backend │ │
│ │ PVC │ │ Storage │ │ Storage │ │ (Azure) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │ │
│ │ 1. Create PVC │ │ │ │
│ │───────────────▶│ │ │ │
│ │ │ 2. Find PV │ │ │
│ │ │───────────────▶│ │ │
│ │ │ │ 3. Provision │ │
│ │ │ │───────────────▶│ │
│ │ │ │ │ 4. Create Disk │
│ │ │ │ │◀───────────────│
│ │ │ │ 5. Bind PV │ │
│ │ │ │◀───────────────│ │
│ │ │ 6. Bind PVC │ │ │
│ │ │◀───────────────│ │ │
│ │ 7. Ready │ │ │ │
│ │◀───────────────│ │ │ │
└─────────────────────────────────────────────────────────────────────────────┘
🏗️ Storage Architecture
┌─────────────────────────────────────────────────────────────────────────────┐
│ STORAGE ARCHITECTURE │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ KUBERNETES CLUSTER │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ POD 1 │ │ POD 2 │ │ POD 3 │ │ POD 4 │ │ │
│ │ │ │ │ │ │ │ │ │ │ │
│ │ │ ┌─────────┐ │ │ ┌─────────┐ │ │ ┌─────────┐ │ │ ┌─────────┐ │ │ │
│ │ │ │ Volume │ │ │ │ Volume │ │ │ │ Volume │ │ │ │ Volume │ │ │ │
│ │ │ │ Mount │ │ │ │ Mount │ │ │ │ Mount │ │ │ │ Mount │ │ │ │
│ │ │ └─────────┘ │ │ └─────────┘ │ │ └─────────┘ │ │ └─────────┘ │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ │ │ │ │ │ │ │
│ │ └────────────────┼────────────────┼────────────────┘ │ │
│ │ │ │ │ │
│ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │
│ │ │ PVCs │ │ │
│ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │
│ │ │ │ PVC: gitea │ │ PVC: mongo │ │ PVC: logs │ │ PVC: jenkins│ │ │ │ │
│ │ │ │ 15Gi │ │ 8Gi │ │ 1Gi │ │ 50Gi │ │ │ │ │
│ │ │ │ RWO │ │ RWO │ │ RWO │ │ RWO │ │ │ │ │
│ │ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │
│ │ └─────────────────────────────────────────────────────────────────────┘ │ │
│ │ │ │ │
│ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │
│ │ │ PVs │ │ │
│ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │
│ │ │ │ PV: gitea │ │ PV: mongo │ │ PV: logs │ │ PV: jenkins │ │ │ │ │
│ │ │ │ 15Gi │ │ 8Gi │ │ 1Gi │ │ 50Gi │ │ │ │ │
│ │ │ │ azure-disk │ │ azure-disk │ │ azure-disk │ │ azure-disk │ │ │ │ │
│ │ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │
│ │ └─────────────────────────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ AZURE STORAGE BACKEND │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Managed Disk│ │ Managed Disk│ │ Managed Disk│ │ Managed Disk│ │ │
│ │ │ 15Gi SSD │ │ 8Gi SSD │ │ 1Gi SSD │ │ 50Gi SSD │ │ │ │
│ │ │ Premium │ │ Premium │ │ Standard │ │ Standard │ │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
🤔 Why Each Pod Needs PVC: The Data Persistence Problem
❌ Without PVC: Data Loss Scenario
┌─────────────────────────────────────────────────────────────────────────────┐
│ WITHOUT PVC (BAD) │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ POD 1 │ │ POD 2 │ │ POD 3 │ │ POD 4 │ │
│ │ nginx:latest│ │ nginx:latest│ │ nginx:latest│ │ nginx:latest│ │
│ │ │ │ │ │ │ │ │ │
│ │ ┌─────────┐ │ │ ┌─────────┐ │ │ ┌─────────┐ │ │ ┌─────────┐ │ │
│ │ │ /tmp │ │ │ │ /tmp │ │ │ │ /tmp │ │ │ │ /tmp │ │ │
│ │ │ (temp) │ │ │ │ (temp) │ │ │ │ (temp) │ │ │ │ (temp) │ │ │
│ │ └─────────┘ │ │ └─────────┘ │ │ └─────────┘ │ │ └─────────┘ │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ 🔄 Pod Restart/Delete → ❌ ALL DATA LOST │
│ │
│ ❌ User uploads gone │
│ ❌ Database files gone │
│ ❌ Configuration gone │
│ ❌ Logs gone │
└─────────────────────────────────────────────────────────────────────────────┘
✅ With PVC: Data Persistence
┌─────────────────────────────────────────────────────────────────────────────┐
│ WITH PVC (GOOD) │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ POD 1 │ │ POD 2 │ │ POD 3 │ │ POD 4 │ │
│ │ nginx:latest│ │ nginx:latest│ │ nginx:latest│ │ nginx:latest│ │
│ │ │ │ │ │ │ │ │ │
│ │ ┌─────────┐ │ │ ┌─────────┐ │ │ ┌─────────┐ │ │ ┌─────────┐ │ │
│ │ │ /data │ │ │ │ /data │ │ │ │ /data │ │ │ │ /data │ │ │
│ │ │ (PVC) │ │ │ │ (PVC) │ │ │ │ (PVC) │ │ │ │ (PVC) │ │ │
│ │ └─────────┘ │ │ └─────────┘ │ │ └─────────┘ │ │ └─────────┘ │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │ │
│ └────────────────┼────────────────┼────────────────┘ │
│ │ │ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ SHARED STORAGE │ │
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
│ │ │ 📁 /data │ │ │
│ │ │ ├── 📄 user-uploads/ │ │ │
│ │ │ ├── 📄 database/ │ │ │
│ │ │ ├── 📄 config/ │ │ │
│ │ │ └── 📄 logs/ │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ 🔄 Pod Restart/Delete → ✅ DATA PERSISTS │
│ │
│ ✅ User uploads preserved │
│ ✅ Database files preserved │
│ ✅ Configuration preserved │
│ ✅ Logs preserved │
└─────────────────────────────────────────────────────────────────────────────┘
🏭 Your Current Kubernetes Setup: PVC Analysis
📊 Your Actual PVC Usage
Based on your codebase analysis, here's how PVCs are currently used:
1. Gitea (Git Repository)
# 🏭 ACTUAL CONFIGURATION FROM YOUR CODEBASE
# freeleaps-ops/freeleaps/helm-pkg/3rd/gitea/values.prod.yaml
persistence:
enabled: true
create: true
mount: true
claimName: gitea-shared-storage
size: 15Gi
accessModes:
- ReadWriteOnce
storageClass: azure-disk-std-lrs
annotations:
helm.sh/resource-policy: keep
What this means:
- ✅ Gitea uses PVC for storing repositories, user data, and configuration
- ✅ 15GB storage allocated for Git repositories and user data
- ✅ Azure Standard Disk (cost-effective for this use case)
- ✅ ReadWriteOnce - only one pod can access at a time
- ✅ Data persists when Gitea pod restarts
2. MongoDB (Database)
# 🏭 ACTUAL CONFIGURATION FROM YOUR CODEBASE
# freeleaps-ops/freeleaps/helm-pkg/3rd/mongo/values.yaml
persistence:
enabled: true
size: 8Gi
accessModes:
- ReadWriteOnce
storageClass: "" # Uses default Azure storage class
What this means:
- ✅ MongoDB uses PVC for database files
- ✅ 8GB storage for database data
- ✅ Data persists when MongoDB pod restarts
- ✅ Critical for data integrity
3. Jenkins (CI/CD)
# 🏭 ACTUAL CONFIGURATION FROM YOUR CODEBASE
# freeleaps-ops/cluster/manifests/freeleaps-devops-system/jenkins/values.yaml
persistence:
enabled: true
storageClass: azure-blob-fuse-2-std-lrs
accessMode: "ReadWriteOnce"
size: "50Gi"
What this means:
- ✅ Jenkins uses PVC for build artifacts, workspace data
- ✅ 50GB storage for build history and artifacts
- ✅ Azure Blob Storage (cost-effective for large files)
- ✅ Build history preserved across pod restarts
4. Central Storage (Logs)
# 🏭 ACTUAL CONFIGURATION FROM YOUR CODEBASE
# freeleaps-ops/freeleaps/helm-pkg/centralStorage/templates/central-storage/pvc.yaml
persistence:
enabled: true
size: 1Gi
accessModes:
- ReadWriteOnce
What this means:
- ✅ Central storage uses PVC for log ingestion
- ✅ 1GB storage for log processing
- ✅ Logs preserved during processing
📋 PVC Usage Summary
| Application | PVC Name | Size | Storage Class | Purpose | Critical? |
|---|---|---|---|---|---|
| Gitea | gitea-shared-storage |
15Gi | azure-disk-std-lrs |
Git repositories, user data | 🔴 Critical |
| MongoDB | mongodb-datadir |
8Gi | Default | Database files | 🔴 Critical |
| Jenkins | jenkins-pvc |
50Gi | azure-blob-fuse-2-std-lrs |
Build artifacts, workspace | 🟡 Important |
| Central Storage | central-storage-logs-pvc |
1Gi | Default | Log processing | 🟢 Nice to have |
🤷♂️ Does Each Pod Need PVC? NO!
❌ Common Misconception
"Every pod needs a PVC" - This is WRONG!
✅ Reality: PVCs Are Optional
┌─────────────────────────────────────────────────────────────────────────────┐
│ PVC DECISION TREE │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ DOES YOUR APP NEED PERSISTENT DATA? │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ YES │ │ NO │ │ │
│ │ │ │ │ │ │ │
│ │ │ ┌─────────┐ │ │ ┌─────────┐ │ │ │
│ │ │ │ USE │ │ │ │ DON'T │ │ │ │
│ │ │ │ PVC │ │ │ │ USE │ │ │ │
│ │ │ │ │ │ │ │ PVC │ │ │ │
│ │ │ └─────────┘ │ │ └─────────┘ │ │ │
│ │ └─────────────┘ └─────────────┘ │ │
│ │ │ │
│ │ Examples: │ │
│ │ • Databases (PostgreSQL, MongoDB) │ │
│ │ • File storage (Gitea, Jenkins) │ │
│ │ • Application data (user uploads) │ │
│ │ • Logs (if you want to keep them) │ │
│ │ │ │
│ │ Examples: │ │
│ │ • Web servers (nginx, static content) │ │
│ │ • API servers (stateless applications) │ │
│ │ • Cache servers (Redis, Memcached) │ │
│ │ • Load balancers │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
📊 Your Current Setup Analysis
Looking at your applications:
✅ Applications WITH PVCs (Need Persistent Data)
- Gitea: Git repositories, user data, configuration
- MongoDB: Database files
- Jenkins: Build artifacts, workspace data
- Central Storage: Log processing
❌ Applications WITHOUT PVCs (Stateless)
- Nginx Ingress Controller: Stateless routing
- ArgoCD: GitOps configuration (stored in Git)
- Cert-manager: Certificate management (stateless)
- Prometheus/Grafana: Metrics (can use PVC for data retention)
🎯 PVC Considerations: When to Use Them
✅ Use PVCs When:
1. Database Applications
# Database needs persistent storage
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres
spec:
template:
spec:
containers:
- name: postgres
image: postgres:13
volumeMounts:
- name: db-storage
mountPath: /var/lib/postgresql/data
volumes:
- name: db-storage
persistentVolumeClaim:
claimName: postgres-pvc
2. File Storage Applications
# File server needs persistent storage
apiVersion: apps/v1
kind: Deployment
metadata:
name: file-server
spec:
template:
spec:
containers:
- name: file-server
image: nginx:latest
volumeMounts:
- name: file-storage
mountPath: /var/www/html
volumes:
- name: file-storage
persistentVolumeClaim:
claimName: file-storage-pvc
3. Application Data
# Application needs to store user data
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
template:
spec:
containers:
- name: my-app
image: my-app:latest
volumeMounts:
- name: app-data
mountPath: /app/data
volumes:
- name: app-data
persistentVolumeClaim:
claimName: app-data-pvc
❌ Don't Use PVCs When:
1. Stateless Web Servers
# Web server doesn't need persistent storage
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-server
spec:
template:
spec:
containers:
- name: web-server
image: nginx:latest
# No volumeMounts needed - stateless
2. API Servers
# API server doesn't need persistent storage
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
spec:
template:
spec:
containers:
- name: api-server
image: my-api:latest
# No volumeMounts needed - stateless
3. Cache Servers
# Cache server doesn't need persistent storage
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis-cache
spec:
template:
spec:
containers:
- name: redis
image: redis:latest
# No volumeMounts needed - cache is temporary
🔧 PVC Configuration Options
1. Access Modes
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-pvc
spec:
accessModes:
- ReadWriteOnce # Single node read/write (most common)
- ReadOnlyMany # Multiple nodes read-only
- ReadWriteMany # Multiple nodes read/write (rare)
resources:
requests:
storage: 10Gi
2. Storage Classes
# Azure Storage Classes Available
storageClass: azure-disk-std-lrs # Standard HDD (cheapest)
storageClass: azure-disk-premium-lrs # Premium SSD (fastest)
storageClass: azure-blob-fuse-2-std-lrs # Blob storage (for large files)
3. Size Considerations
# Size your PVCs appropriately
resources:
requests:
storage: 1Gi # Small: logs, config
storage: 10Gi # Medium: databases
storage: 100Gi # Large: file storage, backups
🚨 Common PVC Mistakes
❌ Mistake 1: Using PVC for Everything
# ❌ DON'T DO THIS
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
spec:
template:
spec:
containers:
- name: nginx
image: nginx:latest
volumeMounts:
- name: temp-storage # ❌ Unnecessary PVC
mountPath: /tmp
volumes:
- name: temp-storage
persistentVolumeClaim:
claimName: temp-pvc # ❌ Waste of resources
❌ Mistake 2: Not Setting Resource Limits
# ❌ DON'T DO THIS
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: unlimited-pvc
spec:
accessModes:
- ReadWriteOnce
# ❌ No size limit - could consume all storage
✅ Correct Approach
# ✅ DO THIS
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: limited-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi # ✅ Set appropriate size
📚 Best Practices
1. Size Appropriately
- Start small and scale up
- Monitor actual usage
- Use storage quotas
2. Choose Right Storage Class
- Standard HDD: Cost-effective for backups, logs
- Premium SSD: Performance-critical databases
- Blob Storage: Large files, archives
3. Use Labels and Annotations
metadata:
name: my-pvc
labels:
app: my-app
environment: production
storage-type: database
annotations:
helm.sh/resource-policy: keep # Don't delete on helm uninstall
4. Monitor Usage
# Check PVC usage
kubectl get pvc
kubectl describe pvc <pvc-name>
# Check storage classes
kubectl get storageclass
# Monitor disk usage in pods
kubectl exec <pod-name> -- df -h
🔍 Your Setup Recommendations
Current State: Good!
Your current setup uses PVCs appropriately:
- ✅ Gitea: 15Gi for repositories (appropriate)
- ✅ MongoDB: 8Gi for database (appropriate)
- ✅ Jenkins: 50Gi for builds (appropriate)
- ✅ Central Storage: 1Gi for logs (appropriate)
Potential Improvements
- Monitor usage: Check actual disk usage in these PVCs
- Consider backups: Implement PVC backup strategy
- Storage quotas: Set namespace storage limits
- Performance tuning: Use Premium SSD for databases if needed
📖 Next Steps
-
Monitor your current PVCs:
kubectl get pvc --all-namespaces kubectl describe pvc <pvc-name> -
Check storage usage:
kubectl exec -it <pod-name> -- df -h -
Learn about backup strategies:
- Azure Backup for PVCs
- Velero for Kubernetes backups
-
Consider storage optimization:
- Right-size PVCs based on actual usage
- Use appropriate storage classes for cost optimization
Last Updated: September 3, 2025 Version: 1.0 Maintainer: Infrastructure Team