Introduction to Cluster Computing
A guide to understanding cluster computing fundamentals
Login Node
- Entry point to the cluster
- For lightweight tasks:
- Code editing
- Git operations
- File management
- ⚠️Avoid heavy computations here!
Compute Nodes
- Main processing power
- Exclusive allocation
- For:
- Code compilation
- Heavy computations
- Job execution
salloc
Get interactive node
squeue
Check job status
sbatch
Submit batch jobs
scancel
Cancel jobs
srun
Run commands on nodes
Key Concepts
- CPU cores and layout
- Memory hierarchy (L1, L2, L3 cache)
- NUMA architecture
Analysis Tools
/proc/cpuinfo
lscpu
hwloc
Q: Why can't I compile on the login node?
A: Login nodes are shared resources. Compilation is CPU-intensive and could affect other users.
Q: What happens if my job exceeds the wall time?
A: The job will be forcefully terminated. Always estimate conservatively!
Q: How do I know how many resources to request?
A: Start small, test, and scale up based on your needs and job performance.