Mulțumim pentru trimiterea solicitării! Un membru al echipei noastre vă va contacta în curând.
Mulțumim pentru trimiterea rezervării! Un membru al echipei noastre vă va contacta în curând.
Schița de curs
EXO Infrastructure as Code
- Overview of EXO deployment patterns: single-node, multi-node, and RDMA clusters
- Automating dependency installation (Xcode, uv, Node.js, Rust) with configuration management
- Using Nix flakes for reproducible EXO builds and developer environments
- Writing Ansible playbooks or shell scripts for unattended cluster provisioning
Reproducible Builds and CI Integration
- Pinning dependencies and building the dashboard in CI pipelines
- Running EXO smoke tests in GitHub Actions or GitLab CI runners
- Creating golden images and snapshot-based rollback workflows for macOS and Linux VMs
- Versioning custom model cards alongside application code
Cluster Discovery and Networking Automation
- Configuring mDNS and static DNS for reliable libp2p node discovery
- Automating network profile creation and Thunderbolt bridge management on macOS
- Using custom namespaces (EXO_LIBP2P_NAMESPACE) to separate dev, staging, and prod clusters
- Firewall rules and network segmentation for multi-tenant environments
Storage and Model Lifecycle Management
- Designing EXO_MODELS_DIRS and EXO_MODELS_READ_ONLY_DIRS strategies
- Mounting NFS or SAN shares as read-only model repositories for fast provisioning
- Garbage collection of stale caches and versioned weight retention policies
- Automating model pre-downloads and health checks before rolling updates
Monitoring and Alerting
- Shipping EXO logs to centralized logging (ELK, Loki, or Splunk)
- Building Grafana dashboards from EXO_TRACING_ENABLED output
- Alerting on cluster membership changes, OOM events, and inference latency spikes
- Correlating macmon hardware telemetry with model performance regressions
Update, Rollback, and Disaster Recovery
- Staging EXO binary updates in a canary node before fleet-wide rollout
- Model-level rollback: switching between quantized versions without re-downloading
- Backing up and restoring cluster state, custom namespaces, and cached weights
- Documenting recovery runbooks for total cluster rebuild scenarios
Security Hardening and Compliance
- Applying TLS at the reverse proxy layer (nginx, traefik) for the dashboard and API
- Implementing API rate limiting and IP whitelisting for EXO endpoints
- Isolating clusters with VLANs and zero-trust network policies
- Auditing access and maintaining an inventory of deployed models and versions
Cerințe
- Experience with DevOps practices (CI/CD, IaC, container orchestration)
- Familiarity with macOS or Linux system administration and package management
- Understanding of networking, DNS, and storage concepts
Audience
- DevOps engineers
- Infrastructure architects
- SREs responsible for on-premise AI workloads
21 Ore
Mărturii (2)
Craig a fost extrem de implicat în instruire, mereu asigurându-se că suntem atenți, adaptând exemplele la activitățile noastre zilnice și mereu oferind un răspuns când era întrebat, chiar dacă informația nu era inclusă în prezentare.
Ecaterina Ioana Nicoale - BOOKING HOLDINGS ROMANIA SRL
Curs - DevOps Foundation®
Tradus de catre o masina
Un grad ridicat de angajament și cunoștințe al instrucționarului
Jacek - Softsystem
Curs - DevOps Engineering Foundation (DOEF)®
Tradus de catre o masina