This talk will present the low-precision techniques, analysis and tool chain we explored to optimize the performance of production scale recommendation models while maintaining the stringent accuracy requirements. We also share the unique challenges and learnings from the deployment of Facebook’s production recommendation models in low precision on existing hardware platforms including CPUs and accelerators. We hope that the methodologies we are sharing are applicable to many ML domains and low precision architectures in general.
Summer Deng is a research scientist at Facebook, working on low precision optimizations for machine learning inference on CPUs and accelerators. Before joining Facebook, Summer received her PhD degree from UC Santa Barbara in 2017. Her research focused on statistical methods to guide computer architecture design. Currently, her interests broadly lie in the intersection of machine learning and computer architecture areas.