Skip to main content

November 16 | Virtual & Free

SigOpt AI & HPC Summit 2021


Talk: Faster, Better Training for Recommendation Systems

3 - 3:30pm

DLRM (Deep Learning Recommendation Model) is a deep learning-based model for recommendations introduced and open sourced by Facebook. It’s one of the State-Of-The-Art models and part of the MLPerf training benchmark. DLRM workload poses unique challenges for single-socket and multi-socket distributed training due to the need to balance a mixture of compute-bound, memory-bound and I/O-bound operations. To tackle this, we implemented an efficient scale-out solution for DLRM training on Intel Xeon clusters that includes innovative data and model parallelization, new hybrid splitSGD + LAMB optimizers, efficient hyperparameter tuning for model convergence with much larger global batch size, and novel data loader techniques to support scale-up and scale-out. According to the MLPerf v1.0 training result, we can train DLRM with 64 Xeon Cooper-Lake 8376H processors in 15 minutes, a 3X improvement compared with our MLPerf v0.7 submission with 16 Xeon Cooper-Lake 8380 processors. In this talk, Ke will discuss DLRM, the unique challenges associated with it and these optimizations that drive training performance acceleration.

Add to Calendar 11/16/2021 3:00 pm 11/16/2021 3:30 pm America/Los_Angeles Talk: Faster, Better Training for Recommendation Systems SigOpt AI & HPC Summit 2021 - Virtual & Free


Ke Ding

Principal AI Engineer Intel, SATG MLP – Applied ML team