PyTorch Native
veScale is rooted in PyTorch-native data structures, operators, and APIs, enjoying the ecosystem of PyTorch that dominates the ML world.
A PyTorch Native LLM Training Framework
An Industrial-Level Framework for Ease of Use
veScale is rooted in PyTorch-native data structures, operators, and APIs, enjoying the ecosystem of PyTorch that dominates the ML world.
veScale decouples distributed system design from model architecture, requiring near-zero or zero modification on the model code of users.
veScale provides single-device semantics to users, automatically distributing and orchestrating model execution in a cluster of devices.
veScale parallelizes model execution with a synergy of strategies (tensor, sequence, data, ZeRO, pipeline parallelism) under semi- or full-automation [coming soon].
veScale supports not only Eager-mode automation for parallel training and inference but also Compile-mode for ultimate performance [coming soon].
veScale manages distributed checkpoints automatically with online resharding across different cluster sizes and different parallelism strategies.