This talk will focus on practical and real-world considerations involved with maximizing training speed of deep learning recommender engines. Training deep learning recommenders at scale introduces an interesting set of challenges, because of potential imbalances in compute and communication resources in many training platforms. Our experience in benchmarking the DLRM workload for MLPerf on TensorFlow/TPUs will be used as an exemplar case. In addition, we will use the lessons learned to suggest best practices for efficient design points when tuning recommender architectures.
Tayo Oguntebi is software engineer at Google, focusing on systems for efficient and high-performance training of deep learning models. Most recently, he has been thinking about problems at the intersection of sparsity and deep learning. Prior to Google, Tayo completed a PhD at Stanford University with a focus on domain specific computer architectures for graph analytics. He has also dabbled in software platforms for annealing quantum computing. In his meager spare time, he enjoys being outdoors, sports, and spending time with his wife.