freeleaps-ops/docs/README.md

414 lines
16 KiB
Markdown
Raw Normal View History

2025-09-03 23:59:04 +00:00
# 🚀 FreeLeaps DevOps Learning Path for Junior Engineers
> **Production-Ready Kubernetes & DevOps Documentation**
> *Your gateway to understanding our actual infrastructure and becoming a DevOps expert*
---
## 📋 **Table of Contents**
1. [🎯 **Quick Start Guide**](#-quick-start-guide)
2. [🏗️ **Your Production Infrastructure**](#-your-production-infrastructure)
3. [📚 **Core Learning Materials**](#-core-learning-materials)
4. [🔧 **Practical Exercises**](#-practical-exercises)
5. [⚡ **Essential Commands**](#-essential-commands)
6. [🎓 **Learning Path**](#-learning-path)
7. [🔍 **Production Troubleshooting**](#-production-troubleshooting)
8. [📖 **Additional Resources**](#-additional-resources)
---
## 🎯 **Quick Start Guide**
### **🚀 First Day Checklist**
- [ ] **Access your production cluster**: `kubectl config use-context your-cluster`
- [ ] **Explore the management UI**: [RabbitMQ Management UI](#rabbitmq-management-ui)
- [ ] **Check ArgoCD**: Visit `https://argo.mathmast.com`
- [ ] **Review monitoring**: Access Grafana dashboards
- [ ] **Understand your apps**: Check `freeleaps-devops-reconciler` status
### **🔑 Essential Access Points**
```bash
# Your production cluster access
kubectl config get-contexts
kubectl get nodes -o wide
# Your actual services
kubectl get svc -A | grep -E "(rabbitmq|argocd|jenkins|gitea)"
# Your actual namespaces
kubectl get namespaces | grep freeleaps
```
---
## 🏗️ **Your Production Infrastructure**
### **🌐 Production Domains & Services**
| **Service** | **Production URL** | **Purpose** | **Access** |
|-------------|-------------------|-------------|------------|
| **ArgoCD** | `https://argo.mathmast.com` | GitOps deployment | Web UI |
| **Gitea** | `https://gitea.freeleaps.mathmast.com` | Git repository | Web UI |
| **Jenkins** | `http://jenkins.freeleaps.mathmast.com` | CI/CD pipelines | Web UI (Internal access only) |
2025-09-03 23:59:04 +00:00
| **RabbitMQ** | `http://rabbitmq:15672` | Message broker | Management UI |
| **Grafana** | `https://grafana.mathmast.com` | Monitoring | Dashboards |
### **🔧 Production Architecture**
```
┌─────────────────────────────────────────────────────────────┐
│ PRODUCTION INFRASTRUCTURE │
├─────────────────────────────────────────────────────────────┤
│ Azure Load Balancer (4.155.160.32) │
│ ┌─────────────────┐ ┌─────────────────┐ ┌──────────────┐ │
│ │ Ingress-NGINX │ │ cert-manager │ │ ArgoCD │ │
│ │ Controller │ │ (Let's Encrypt)│ │ (GitOps) │ │
│ └─────────────────┘ └─────────────────┘ └──────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ ┌─────────────────┐ ┌─────────────────┐ ┌──────────────┐ │
│ │ RabbitMQ │ │ Jenkins │ │ Gitea │ │
│ │ (Message Q) │ │ (CI/CD) │ │ (Git Repo) │ │
│ └─────────────────┘ └─────────────────┘ └──────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ ┌─────────────────┐ ┌─────────────────┐ ┌──────────────┐ │
│ │ freeleaps- │ │ freeleaps- │ │ freeleaps- │ │
│ │ devops- │ │ apps │ │ monitoring │ │
│ │ reconciler │ │ (Your Apps) │ │ (Metrics) │ │
│ └─────────────────┘ └─────────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
### **📊 Production Namespaces**
```bash
# Your actual namespaces
freeleaps-alpha # Alpha environment
freeleaps-prod # Production environment
freeleaps-devops-system # DevOps tools
freeleaps-controls-system # Control plane
freeleaps-monitoring-system # Monitoring stack
```
---
## 📚 **Core Learning Materials**
### **🎓 Phase 1: Kubernetes Fundamentals**
- **[Kubernetes Core Concepts Guide](Kubernetes_Core_Concepts_Guide.md)** - *Start here!*
- **Production Connection**: Your actual pods, services, and deployments
- **Real Examples**: Based on your `freeleaps-devops-reconciler` deployment
- **Hands-on**: Practice with your actual cluster
- **[PVC Deep Dive Guide](PVC_Deep_Dive_Guide.md)** - *Storage fundamentals*
- **Production Connection**: Your Azure disk storage classes
- **Real Examples**: How your apps use persistent storage
- **Troubleshooting**: Common storage issues in your environment
### **🔧 Phase 2: DevOps Infrastructure**
- **[Custom Resources & Operators Guide](Custom_Resources_And_Operators_Guide.md)** - *Advanced concepts*
- **Production Connection**: Your `freeleaps-devops-reconciler` operator
- **Real Examples**: How your CRDs work in production
- **Architecture**: Understanding your operator pattern
- **[Reconciler Architecture Deep Dive](Reconciler_Architecture_Deep_Dive.md)** - *Your core system*
- **Production Connection**: Your actual reconciler deployment
- **Real Examples**: How your DevOps automation works
- **Troubleshooting**: Common reconciler issues
- **[Reconciler Framework Analysis](Reconciler_Framework_Analysis.md)** - *Technical deep dive*
- **Production Connection**: Your Python/Kopf operator framework
- **Real Examples**: Code analysis from your actual implementation
- **Best Practices**: How to improve your reconciler
### **🌐 Phase 3: Networking & Ingress**
- **[Ingress Setup & Redirects Guide](Ingress_Setup_And_Redirects_Guide.md)** - *Web traffic management*
- **Production Connection**: Your actual ingress controllers
- **Real Examples**: How your domains are configured
- **Troubleshooting**: Common ingress issues
- **[Current Ingress Analysis](Current_Ingress_Analysis.md)** - *Your actual setup*
- **Production Connection**: Your real ingress configurations
- **Real Examples**: Your actual domain routing
- **Monitoring**: How to check ingress health
### **📨 Phase 4: Messaging & Communication**
- **[RabbitMQ Management Analysis](RabbitMQ_Management_Analysis.md)** - *Message broker*
- **Production Connection**: Your actual RabbitMQ deployment
- **Real Examples**: Your message queues and exchanges
- **Management UI**: How to use the built-in management interface
### **🗄️ Phase 4.5: Database Management**
- **[PostgreSQL & Gitea Integration Guide](PostgreSQL_Gitea_Integration_Guide.md)** - *Database operations*
- **Production Connection**: Your actual PostgreSQL deployments (Alpha vs Production)
- **Real Examples**: How Gitea connects to PostgreSQL in your environments
- **Data Access**: How to access and manage your Gitea database
- **Monitoring**: Database health checks and performance monitoring
2025-09-03 23:59:04 +00:00
### **🚀 Phase 5: Operations & Deployment**
- **[Kubernetes Bootstrap Guide](Kubernetes_Bootstrap_Guide.md)** - *Cluster setup*
- **Production Connection**: How your cluster was built
- **Real Examples**: Your actual bootstrap process
- **Maintenance**: How to maintain your cluster
- **[Azure K8s Node Addition Runbook](Azure_K8s_Node_Addition_Runbook.md)** - *Scaling*
- **Production Connection**: How to add nodes to your cluster
- **Real Examples**: Your actual node addition process
- **Automation**: Scripts for node management
---
## 🔧 **Practical Exercises**
### **🎯 Exercise 1: Explore Your Production Cluster**
```bash
# 1. Connect to your cluster
kubectl config use-context your-production-cluster
# 2. Explore your namespaces
kubectl get namespaces | grep freeleaps
# 3. Check your actual deployments
kubectl get deployments -A | grep freeleaps
# 4. Monitor your reconciler
kubectl logs -f deployment/freeleaps-devops-reconciler -n freeleaps-devops-system
```
### **🎯 Exercise 2: RabbitMQ Management UI**
```bash
# 1. Port forward to RabbitMQ management UI
kubectl port-forward svc/rabbitmq-headless -n freeleaps-alpha 15672:15672
# 2. Access the UI: http://localhost:15672
# Username: user
# Password: NjlhHFvnDuC7K0ir
# 3. Explore your queues:
# - freeleaps.devops.reconciler.queue
# - freeleaps.devops.reconciler.input
```
### **🎯 Exercise 3: ArgoCD GitOps**
```bash
# 1. Access ArgoCD: https://argo.mathmast.com
# 2. Explore your applications:
# - freeleaps-devops-reconciler
# - freeleaps-apps
# - monitoring stack
# 3. Check deployment status
kubectl get applications -n argocd
```
### **🎯 Exercise 4: Monitor Your Infrastructure**
```bash
# 1. Check cluster health
kubectl get nodes -o wide
# 2. Monitor resource usage
kubectl top nodes
kubectl top pods -A
# 3. Check ingress status
kubectl get ingress -A
```
---
## ⚡ **Essential Commands**
### **🔍 Production Monitoring**
```bash
# Your cluster health
kubectl get nodes -o wide
kubectl get pods -A --field-selector=status.phase!=Running
# Your services
kubectl get svc -A | grep -E "(rabbitmq|argocd|jenkins|gitea)"
# Your reconciler status
kubectl get deployment freeleaps-devops-reconciler -n freeleaps-devops-system
kubectl logs -f deployment/freeleaps-devops-reconciler -n freeleaps-devops-system
```
### **🔧 Troubleshooting**
```bash
# Check reconciler health
kubectl describe deployment freeleaps-devops-reconciler -n freeleaps-devops-system
# Check RabbitMQ status
kubectl get pods -n freeleaps-alpha | grep rabbitmq
kubectl logs -f deployment/rabbitmq -n freeleaps-alpha
# Check ingress issues
kubectl describe ingress -A
kubectl get events -A --sort-by='.lastTimestamp'
```
### **📊 Resource Management**
```bash
# Monitor resource usage
kubectl top nodes
kubectl top pods -A
# Check storage
kubectl get pvc -A
kubectl get pv
# Check networking
kubectl get svc -A
kubectl get endpoints -A
```
---
## 🎓 **Learning Path**
### **📅 Week 1: Foundations**
- **Day 1-2**: [Kubernetes Core Concepts](Kubernetes_Core_Concepts_Guide.md)
- **Day 3-4**: [PVC Deep Dive](PVC_Deep_Dive_Guide.md)
- **Day 5**: Practice exercises with your actual cluster
### **📅 Week 2: DevOps Infrastructure**
- **Day 1-2**: [Custom Resources & Operators](Custom_Resources_And_Operators_Guide.md)
- **Day 3-4**: [Reconciler Architecture](Reconciler_Architecture_Deep_Dive.md)
- **Day 5**: [Reconciler Framework Analysis](Reconciler_Framework_Analysis.md)
### **📅 Week 3: Networking & Communication**
- **Day 1-2**: [Ingress Setup & Redirects](Ingress_Setup_And_Redirects_Guide.md)
- **Day 3**: [Current Ingress Analysis](Current_Ingress_Analysis.md)
- **Day 4-5**: [RabbitMQ Management](RabbitMQ_Management_Analysis.md)
### **📅 Week 4: Operations & Production**
- **Day 1-2**: [Kubernetes Bootstrap](Kubernetes_Bootstrap_Guide.md)
- **Day 3-4**: [Azure Node Addition](Azure_K8s_Node_Addition_Runbook.md)
- **Day 5**: Production troubleshooting and monitoring
---
## 🔍 **Production Troubleshooting**
### **🚨 Common Issues & Solutions**
#### **1. Reconciler Not Working**
```bash
# Check reconciler status
kubectl get deployment freeleaps-devops-reconciler -n freeleaps-devops-system
kubectl logs -f deployment/freeleaps-devops-reconciler -n freeleaps-devops-system
# Check RabbitMQ connection
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl list_connections
```
#### **2. Ingress Issues**
```bash
# Check ingress controller
kubectl get pods -n ingress-nginx
kubectl logs -f deployment/ingress-nginx-controller -n ingress-nginx
# Check certificates
kubectl get certificates -A
kubectl describe certificate -n your-namespace
```
#### **3. Storage Problems**
```bash
# Check PVC status
kubectl get pvc -A
kubectl describe pvc your-pvc-name -n your-namespace
# Check storage classes
kubectl get storageclass
```
### **📊 Monitoring & Alerts**
#### **Key Metrics to Watch**
- **Cluster health**: Node status, pod restarts
- **Resource usage**: CPU, memory, disk
- **Network**: Ingress traffic, service connectivity
- **Applications**: Reconciler health, RabbitMQ queues
#### **Alerting Setup**
```bash
# Check Prometheus targets
kubectl get targets -n freeleaps-monitoring-system
# Check Grafana dashboards
# Access: https://grafana.mathmast.com
```
---
## 📖 **Additional Resources**
### **🔗 Official Documentation**
- **[Kubernetes Documentation](https://kubernetes.io/docs/)** - Official K8s docs
- **[ArgoCD Documentation](https://argo-cd.readthedocs.io/)** - GitOps platform
- **[RabbitMQ Documentation](https://www.rabbitmq.com/documentation.html)** - Message broker
- **[Helm Documentation](https://helm.sh/docs/)** - Package manager
### **🎥 Video Resources**
- **Kubernetes Crash Course**: [TechWorld with Nana](https://www.youtube.com/watch?v=s_o8dwzRlu4)
- **ArgoCD Tutorial**: [ArgoCD Official](https://www.youtube.com/watch?v=MeU5_k9ssOY)
- **RabbitMQ Basics**: [RabbitMQ Official](https://www.youtube.com/watch?v=deG25y_r6OI)
### **📚 Books**
- **"Kubernetes in Action"** by Marko Lukša
- **"GitOps and Kubernetes"** by Billy Yuen
- **"RabbitMQ in Depth"** by Gavin M. Roy
### **🛠️ Tools & Utilities**
- **[k9s](https://k9scli.io/)** - Terminal UI for K8s
- **[Lens](https://k8slens.dev/)** - Desktop IDE for K8s
- **[kubectx](https://github.com/ahmetb/kubectx)** - Context switching
---
## 🎯 **Next Steps**
### **🚀 Immediate Actions**
1. **Set up your development environment** with kubectl and helm
2. **Access your production cluster** and explore the resources
3. **Complete the practical exercises** in this guide
4. **Join the monitoring dashboards** and understand the metrics
### **📈 Career Development**
1. **Get certified**: [CKA (Certified Kubernetes Administrator)](https://www.cncf.io/certification/cka/)
2. **Contribute**: Help improve the reconciler and infrastructure
3. **Learn**: Stay updated with latest K8s and DevOps practices
4. **Share**: Document your learnings and share with the team
### **🤝 Team Collaboration**
- **Code reviews**: Review reconciler changes
- **Documentation**: Improve this guide based on your experience
- **Mentoring**: Help other junior engineers
- **Innovation**: Suggest improvements to the infrastructure
---
## 📞 **Support & Contact**
### **🆘 Getting Help**
- **Team Slack**: #devops-support channel
- **Documentation**: This guide and linked resources
- **Code Reviews**: GitHub pull requests
- **Pair Programming**: Schedule sessions with senior engineers
### **📝 Feedback**
- **Documentation**: Create issues for improvements
- **Process**: Suggest workflow optimizations
- **Tools**: Recommend new tools or improvements
---
**🎉 Welcome to the FreeLeaps DevOps team! You're now part of a production infrastructure that serves real users. Take ownership, learn continuously, and help us build amazing things!**
---
*Last updated: $(date)*
*Maintained by: FreeLeaps DevOps Team*