Introduction to Cluster Computing

A guide to understanding cluster computing fundamentals

Login Node

  • Entry point to the cluster
  • For lightweight tasks:
    • Code editing
    • Git operations
    • File management
  • ⚠️Avoid heavy computations here!

Compute Nodes

  • Main processing power
  • Exclusive allocation
  • For:
    • Code compilation
    • Heavy computations
    • Job execution
salloc

Get interactive node

squeue

Check job status

sbatch

Submit batch jobs

scancel

Cancel jobs

srun

Run commands on nodes

Key Concepts

  • CPU cores and layout
  • Memory hierarchy (L1, L2, L3 cache)
  • NUMA architecture

Analysis Tools

  • /proc/cpuinfo
  • lscpu
  • hwloc

Q: Why can't I compile on the login node?

A: Login nodes are shared resources. Compilation is CPU-intensive and could affect other users.

Q: What happens if my job exceeds the wall time?

A: The job will be forcefully terminated. Always estimate conservatively!

Q: How do I know how many resources to request?

A: Start small, test, and scale up based on your needs and job performance.