Distributed and Parallel Training Tutorials¶

Distributed training is a model training paradigm that involves spreading training workload across multiple worker nodes, therefore significantly improving the speed of training and model accuracy. While distributed training can be used for any type of ML model training, it is most beneficial to use it for large models and compute demanding tasks as deep learning.

There are a few ways you can perform distributed training in PyTorch with each method having their advantages in certain use cases:

DistributedDataParallel (DDP)
Fully Sharded Data Parallel (FSDP)
Tensor Parallel (TP)
Device Mesh
Remote Procedure Call (RPC) distributed training
Custom Extensions

Read more about these options in Distributed Overview.

Learn DDP¶

DDP Intro Video Tutorials

A step-by-step video series on how to get started with DistributedDataParallel and advance to more complex topics

Code Video

https://tutorials.pytorch.kr/beginner/ddp_series_intro.html?utm_source=distr_landing&utm_medium=ddp_series_intro

Getting Started with Distributed Data Parallel

This tutorial provides a short and gentle intro to the PyTorch DistributedData Parallel.

Code

https://tutorials.pytorch.kr/intermediate/ddp_tutorial.html?utm_source=distr_landing&utm_medium=intermediate_ddp_tutorial

Distributed Training with Uneven Inputs Using the Join Context Manager

This tutorial describes the Join context manager and demonstrates it’s use with DistributedData Parallel.

Code

https://tutorials.pytorch.kr/advanced/generic_join.html?utm_source=distr_landing&utm_medium=generic_join

Learn FSDP¶

Getting Started with FSDP

This tutorial demonstrates how you can perform distributed training with FSDP on a MNIST dataset.

Code

https://tutorials.pytorch.kr/intermediate/FSDP_tutorial.html?utm_source=distr_landing&utm_medium=FSDP_getting_started

FSDP Advanced

In this tutorial, you will learn how to fine-tune a HuggingFace (HF) T5 model with FSDP for text summarization.

Code

https://tutorials.pytorch.kr/intermediate/FSDP_adavnced_tutorial.html?utm_source=distr_landing&utm_medium=FSDP_advanced

Learn Tensor Parallel (TP)¶

Large Scale Transformer model training with Tensor Parallel (TP)

This tutorial demonstrates how to train a large Transformer-like model across hundreds to thousands of GPUs using Tensor Parallel and Fully Sharded Data Parallel.

Code

https://tutorials.pytorch.kr/intermediate/TP_tutorial.html