Skip to main content

November 16 | Virtual & Free

SigOpt AI & HPC Summit 2021


Talk: Faster, Better Training for Recommendation Systems

3 - 3:30pm

DLRM (Deep Learning Recommendation Model) is a deep learning-based model for recommendations introduced and open sourced by Facebook. It’s one of the State-Of-The-Art models and part of the MLPerf training benchmark. DLRM workload poses unique challenges for single-socket and multi-socket distributed training due to the need to balance a mixture of compute-bound, memory-bound and I/O-bound operations. To tackle this, we implemented an efficient scale-out solution for DLRM training on Intel Xeon clusters that includes innovative data and model parallelization, new hybrid splitSGD + LAMB optimizers, efficient hyperparameter tuning for model convergence with much larger global batch size, and novel data loader techniques to support scale-up and scale-out. According to the MLPerf v1.0 training result, we can train DLRM with 64 Xeon Cooper-Lake 8376H processors in 15 minutes, a 3X improvement compared with our MLPerf v0.7 submission with 16 Xeon Cooper-Lake 8380 processors. In this talk, Ke will discuss DLRM, the unique challenges associated with it and these optimizations that drive training performance acceleration.

Use SigOpt free. Sign up today.

Add to Calendar 11/16/2021 3:00 pm 11/16/2021 3:30 pm America/Los_Angeles Talk: Faster, Better Training for Recommendation Systems SigOpt AI & HPC Summit 2021 - Virtual & Free


Ke Ding

Principal AI Engineer Intel, SATG MLP – Applied ML team