February 23, 2022
Introducing TorchRec, a library for modern production recommendation systems
We are excited to announce TorchRec, a PyTorch domain library for Recommendation Systems. This new library provides common sparsity and parallelism primitives, enabling researchers to build state-of-the-art personalization models and deploy them in production.
February 08, 2022
Practical Quantization in PyTorch
Quantization is a cheap and easy way to make your DNN run faster and with lower memory requirements. PyTorch offers a few different approaches to quantize your model. In this blog post, we’ll lay a (quick) foundation of quantization in deep learning, and then take a look at how each technique looks like in practice. Finally we’ll end with recommendations from the literature for using quantization in your workflows.
December 22, 2021
Introducing TorchVision’s New Multi-Weight Support API
TorchVision has a new backwards compatible API for building models with multi-weight support. The new API allows loading different pre-trained weights on the same model variant, keeps track of vital meta-data such as the classification labels and includes the preprocessing transforms necessary for using the models. In this blog post, we plan to review the prototype API, show-case its features and highlight key differences with the existing one.
December 15, 2021
Efficient PyTorch: Tensor Memory Format Matters
Ensuring the right memory format for your inputs can significantly impact the running time of your PyTorch vision models. When in doubt, choose a Channels Last memory format.
December 08, 2021
Announcing the Winners of the 2021 PyTorch Annual Hackathon
More than 1,900 people worked hard in this year’s PyTorch Annual Hackathon to create unique tools and applications for PyTorch developers and researchers.
October 26, 2021
Accelerating PyTorch with CUDA Graphs
Today, we are pleased to announce a new advanced CUDA feature, CUDA Graphs, has been brought to PyTorch. Modern DL frameworks have complicated software stacks that incur significant overheads associated with the submission of each operation to the GPU. When DL workloads are strong-scaled to many GPUs for performance, the time taken by each GPU operation diminishes to just a few microseconds and, in these cases, the high work submission latencies of frameworks often lead to low utilization of ...