A Retrospective on the AMPLab and the Berkeley Data Analytics Stack
Friday, October 28, 2016 8:00:00 PM UTC - 9:30:00 PM UTC
The Algorithms, Machines and People Laboratory (AMPLab) was launched by a group of systems and machine learning faculty at UC Berkeley in early 2011 and was awarded an NSF CISE Expeditions in Computing grant in 2012. The goal of the lab is to develop a new approach to large-scale data analytics (i.e., Big Data processing) that seamlessly integrates the three main resources available for making sense of data at scale: Algorithms (machine learning, statistical and query processing techniques), Machines (scalable clusters and elastic cloud computing), and People (as individual analysts and as crowds). The lab has had significant impact on the Big Data software landscape through the development of a freely-available Open Source software stack called BDAS: the Berkeley Data Analytics Stack. BDAS is a comprehensive analytics platform that has been the incubator for a number of influential systems including the Mesos cluster resource manager (now Apache Mesos), the Spark in-memory computation framework (now Apache Spark), and the Tachyon distributed storage system (now called Alluxio). It contains interfaces for streaming analytics, distributed machine learning, high-performance SQL processing and graph analytics, among others. While serving as a unifying artifact for dozens of Ph.D. and Postdoctoral researchers, BDAS software features prominently in many industry discussions of the future of the Big Data analytics ecosystem - a rare degree of impact for an academic project. The AMPLab is in the final year of its planned six-year existence. In this talk I will provide an overview of AMPLab and BDAS with a focus on identifying the overarching themes of this large software research project. I will highlight some risks we took and a few that we didn’t and will then provide some thoughts on the future of data analytics systems and the best ways for academic researchers to influence that future.
If you've never used Adobe Connect, get a quick overview: http://www.adobe.com/products/adobeconnect.html