Data for NVLink Tests

Post: When is NVLink Worth it?

Inference Data

config sm model AVG PP AVG TG
x16_nvlink tensor gemma 1545.14 34.22
x16_p2p tensor gemma 1547.59 34.41
x16_shm tensor gemma 1198.89 33.75
x8_nvlink tensor gemma 1541.95 34.23
x8_p2p tensor gemma 1475.09 34.40
x8_shm tensor gemma 1220.64 33.69
x4_nvlink tensor gemma 1533.40 34.14
x4_p2p tensor gemma 1311.86 33.90
x4_shm tensor gemma 1117.43 33.28
x16_nvlink tensor llama 797.73 25.76
x16_p2p tensor llama 789.73 26.79
x16_shm tensor llama 598.05 26.85
x8_nvlink tensor llama 799.12 26.00
x8_p2p tensor llama 746.79 27.30
x8_shm tensor llama 617.71 27.14
x4_nvlink tensor llama 795.85 25.93
x4_p2p tensor llama 662.35 27.12
x4_shm tensor llama 565.83 26.92
x16_nvlink layer gemma 1664.07 21.59
x16_p2p layer gemma 1682.22 21.60
x16_shm layer gemma 1685.01 21.60
x4_nvlink layer gemma 1653.51 21.54
x4_p2p layer gemma 1671.35 21.55
x4_shm layer gemma 1664.85 21.54
x8_nvlink layer gemma 1664.23 21.58
x8_p2p layer gemma 1681.06 21.59
x8_shm layer gemma 1687.19 21.59
x16_nvlink layer llama 806.29 18.04
x16_p2p layer llama 815.16 18.09
x16_shm layer llama 806.65 18.06
x4_nvlink layer llama 808.60 18.09
x4_p2p layer llama 818.82 18.15
x4_shm layer llama 802.40 18.02
x8_nvlink layer llama 813.97 18.04
x8_p2p layer llama 823.51 18.15
x8_shm layer llama 809.10 18.08

Training Data

config model strategy train_runtime train_steps_per_second
x16_nvlink qwen fsdp 0:19:55 0.167
x16_p2p qwen fsdp 0:22:40 0.147
x16_shm qwen fsdp 0:58:14 0.057
x8_nvlink qwen fsdp 0:19:54 0.168
x8_p2p qwen fsdp 0:29:11 0.114
x8_shm qwen fsdp 1:01:15 0.054
x4_nvlink qwen fsdp 0:19:52 0.168
x4_p2p qwen fsdp 0:41:18 0.081
x4_shm qwen fsdp 1:13:25 0.045
x16_nvlink tinyllama ddp 0:20:25 0.163
x16_p2p tinyllama ddp 0:20:24 0.163
x16_shm tinyllama ddp 0:21:19 0.156
x8_nvlink tinyllama ddp 0:20:25 0.163
x8_p2p tinyllama ddp 0:20:30 0.163
x8_shm tinyllama ddp 0:20:58 0.159
x4_nvlink tinyllama ddp 0:20:25 0.163
x4_p2p tinyllama ddp 0:21:02 0.158
x4_shm tinyllama ddp 0:21:09 0.158