veScale

English

A PyTorch Native LLM Training Framework

An Industrial-Level Framework for Ease of Use

Introduction

Quick Start

🔥

PyTorch Native

veScale is rooted in PyTorch-native data structures, operators, and APIs, enjoying the ecosystem of PyTorch that dominates the ML world.

🛡

Zero Model Code Change

veScale decouples distributed system design from model architecture, requiring near-zero or zero modification on the model code of users.

🚀

Single Device Abstraction

veScale provides single-device semantics to users, automatically distributing and orchestrating model execution in a cluster of devices.

🎯

Automatic Parallelism Planning

veScale parallelizes model execution with a synergy of strategies (tensor, sequence, data, ZeRO, pipeline parallelism) under semi- or full-automation [coming soon].

⚡

Eager & Compile Mode

veScale supports not only Eager-mode automation for parallel training and inference but also Compile-mode for ultimate performance [coming soon].

📀

Automatic Checkpoint Resharding

veScale manages distributed checkpoints automatically with online resharding across different cluster sizes and different parallelism strategies.