scale.bythebay.io

By the Bay

Scale By the Bay, Online, October 28-29, 2021

      

SBTB 2021 Program is Up!

Scale By the Bay (SBTB) is in its 9th year.

See the 2021 Scale By the Bay Program

When we started, Big Data was Hadoop, with Spark and Kafka quite new and uncertain. Deep Learning was in the lab, and distributed systems were managed by a menagerie sysadmin tools such as Ansible, Salt, Puppet and Chef. Docker and Kubernetes were in the future, but Mesos had proven itself at Twitter, and a young startup called Mesosphere was bringing it to the masses. Another thing proven at Twitter, as well as in Kafka and Spark, was Scala, but the golden era of functional programming in industry was still ahead of us.

AI was still quite unglamorous Machine Learning, Data Mining, Analytics, and Business Intelligence.

But the key themes of SBTB were already there:

The overarching idea of SBTB is that all great scalable systems are a combination of all three. The notions pioneered by Mesos became Kubernetes and its CNCF ecosystem. Scala took hold in industry alongside Haskell, OCaml, Cloujure, and F#. New languages like Rust and Dhall emerged with similar ideas and ideals. Data pipelines were formed around APIs, REST and GraphQL, and tools like Apache Kafka. ML became AI, and every scaled business application became an AI application.

SBTB tracks the evolution of the state of the art in all three of its tracks, nicknamed Functional, Cloud, and Data. The core idea is still making distributed systems solve complex business problems at the web scale, doable by small teams of inspired and happy software engineers. Happiness comes from learning, technology choices automating away the mundane, and a scientific approach to the field. We see the arc of learning elevating through the years, as functional programming concepts drive deep into category theory, type systems are imposed on the deep learning frameworks and tensors, middleware abstracted via GraphQL formalisms, compute made serverless, AI hitting the road as model deployment, and so on. Let’s visit some of the highlights of this evolution in the 2021 program.

FP for ML/AI

As more and more decisions are entrusted to AI, the need to understand what happens in the deep learning systems becomes ever more urgent. While Python remains the Data Science API of choice, the underlying libraries are written in C++. The Hasktorch team shares their approach to expose PyTorch capabilities in Haskell, building up to the transformers with the Gradual Typing. The clarity of composable representations of the deep learning systems will warm many a heart tested by the industry experience where types ensure safety and clarity.

AI

We learn how Machine Learning is used to predict financial time series. We consider the bias in AI and hardware vs software directions of its acceleration. We show how an AI platform can be built from scratch using OSS tools. Practical AI deployments is covered by DVC experiments. We look at the ways Transformers are transforming Autodesk. We see how Machine Learning is becoming reproducible with MLOps at Microsoft. We even break AI dogma with Apache NLPCraft.

Cloud

Our cloud themes include containers with serverless functions, a serverless query engine, event-driven patterns for microservices, and a series of practical stacks. We review the top CNCF projects to watch. Ever-green formidable challenges like data center migration to the cloud at Workday scale are presented by the lead engineers who made it happen. Fine points of scalability are explored beyond auto-scaling. We look at stateful reactive streams with Akka and Kafka, and the ways to retrofit your Java applications with reactive pipelines for more efficiency. See how Kubernetes can spark joy for your developers.

Core OSS Frameworks

As always, we present the best practices deploying OSS projects that our communities adopted before the rest – Spark, Kafka, Druid, integrating them in the data pipelines and tuning for the best performance and ML integration at scale. We cover multiple aspects of tuning Spark performance, using PySpark with location and graph data. We rethink the whole ML ecosystem with Spark. We elucidate patterns of Kafka deployments for building microservice architectures.

Software Engineering

Programming language highlights include Scala 3 transition is illuminated by Dean Wampler and Bill Venners, Meaning for the Masses from Twitter, purity spanning frontend to backend, using type safety for tensor calculus in Haskell and Scala, using Rust for WebAssembly, a categorical view of ADTs, distributed systems and tracing in Swift, complex codebase troubleshooting, dependent and linear types, declarative backends, efficient arrays in Scala 3, and using GraalVM to optimize ML serving. We are also diving into Swift for distributed systems with its core team.

Other Topics

We look at multidimensional clustering, the renessance of the relational databases, cloud SQL and data lakes, location and graph data, meshes, and other themes.

There are fundamental challenges that face the industry for years to come, such as AI bias we rigirously explore, hardware and software codevelopment for AI acceleration, and moving large enterprise codebases from on-prem to the cloud, as we see with Workday.

The companies presenting include Apple, Workday, Nielsen, Uber, Google Brain, Nvidia, Domino Data Labs, Autodesk, Twitter, Microsoft, IBM, Databricks, and many others.# Scale By the Bay 2021 Program is Up!

Reserve your pass today