Loading…

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Thursday, November 12
 

6:30am PST

Grand Welcome and Opening Remarks
Grand welcome to the first global online Scale By the Bay held across two time zones, from the founder and organizer Dr. Alexy Khrabrov and onsite producer Oli Makhasoeva.

Speakers
avatar for Alexy Khrabrov

Alexy Khrabrov

Program Chair, Reactive Summit
avatar for Margarita

Margarita

CEO, Konfy
My journey as an organizer started with a small scientific conference in Novosibirsk, Russia. Since then I’ve been focusing on building communities around the globe.How do we build caring community? In order to build a safe home, you need a strong construction. We are planning every... Read More →


Thursday November 12, 2020 6:30am - 6:45am PST
cloud

6:45am PST

Countdown to 3
The first milestone of Scala 3.0 was published last week and we expect the first release candidate in December. The community has invested a great effort to polish the new language and its tooling and to make sure that many libraries work with it from the start.

In my talk I give a status report of what has been achieved and what still remains to be done. I give a time table for how we expect the 3.0 rollout to work, and provide some glimpse of how Scala will evolve after this important release.

Speakers
avatar for Martin Odersky

Martin Odersky

Professor, EPFL
Martin Odersky is the creator of Scala programming language and a professor at EPFL.


Thursday November 12, 2020 6:45am - 7:45am PST
cloud

8:00am PST

C-chain: the Integration of 5G and real time Blockchain
Many IoT applications require communication and recording of events, both in real time. The recording of events must be immutable and legally binding, which calls for blockchain technology. 

5G offers real time, but classical blockchain techniques are far from real time. C-chain is a lean and high performance variant of blockchains and achieves real time in booking machine transactions in a blockchain datastructure and DLTs. Therefore, 5G and C-chain are a perfect match for new IoT applications. 

This talk gives an overview of the basic principles of the C-chain technology and presents several application scenarios from the fields of driver assistance and autonomous vehicles. 
The C-chain technology was developed at TUM and has recently been transferred to the new startup catena GmbH.

Speakers
avatar for Rudolf Bayer

Rudolf Bayer

Professor, TU Munich
Rudolf Bayer is professor emeritus of Informatics at the Technical University of Munich where he had been employed since 1972. He is noted for inventing three data sorting structures: the B-tree (with Edward M. McCreight), the UB-tree (with Volker Markl) and the red-black tree.Ba... Read More →


Thursday November 12, 2020 8:00am - 8:30am PST
cloud

8:00am PST

Monocle 3: a peek into the future
Optics is a popular topic among library authors; they exist in at least four flavours in Scala: Monocle, Quicklens, Shapeless and Scalaz. Yet end-user applications rarely use them. In this talk, I will discuss the shortcomings of Monocle, a library I created six years ago. And then, I will present game-changing updates that are coming for the next version of Monocle, specifically for Dotty/Scala 3.

Speakers
avatar for Julien Truffaut

Julien Truffaut

Functional Scala instructor, FP Tower
I am a backend engineer with more than 10 years of experience in companies of all sizes, from start-ups to tech giants like Amazon. For the last 5 years, I have been running functional programming training workshops with great success. I focus my training on simple functional programming... Read More →


Thursday November 12, 2020 8:00am - 8:30am PST
functional

8:45am PST

Project Loom? Better Futures? What’s next for JVM concurrent programming
Project Loom’s goal is to bring light-weight threads and continuations to the JVM. Meanwhile, Kotlin developed coroutines, Scala has a rich ecosystem of functional concurrency toolkits, and in Java programming using `CompletableFuture`s is becoming the norm. The JVM concurrency scene is getting crowded! This brings the question: what’s next? What are the problems that need to be solved? How can Project Loom disrupt the status quo? Will `IO`s still have a role to play in a post-Loom functional JVM? In the talk we will explore the changes that project Loom aims to bring to the JVM. We’ll see which problems are solved by virtual threads, and which problems remain. We’ll also try to speculate how Loom might impact the way we do concurrent programming, and why we might want to leverage lazily evaluated, functional effect descriptions. Come and see what’s next for concurrency on the JVM!

Speakers
avatar for Adam Warski

Adam Warski

CTO, SoftwareMill
I am one of the co-founders of SoftwareMill, where I code mainly using Scala and other interesting technologies. I am involved in open-source projects, such as sttp, tapir, Quicklens, ElasticMQ and others. I have been a speaker at major conferences, such as JavaOne, LambdaConf, Devoxx... Read More →


Thursday November 12, 2020 8:45am - 9:15am PST
cloud

8:45am PST

Materialize Typeclasses with Magnolia
Magnolia is the modern way of materializing Typeclasses for any datatypes composed with product(case class) and co-product(trait). Magnolia is a simpler yet powerful and performant library. Highly functional programming libraries like cats and algebirds heavily depend on type classes. Most of the time, developing a scala library requires to write library-specific Typeclasses, and library users need to implement their domain-specific instances of these Typeclasses. This step could be automated with Magnolia. In Spotify, we use Magnolia to compile-time Typeclass generations for many storage types(Avro, Protobuf, Bigquery, etc.) to Scala case classes and use with Scio. This reduces a lot of boilerplate code, which otherwise users have to handcraft. This code is open source and available under spotify/magnolify github repository. Let's get our hand dirty and implement a case class diffing library that outputs the difference of any given case classes. Here we will use Magnolia to derive the type classes for any given datatype(product or coproduct). Let's talk about the limitations of the Magnolia library and how to work around some of it.

Speakers
avatar for Shameera Rathnayaka

Shameera Rathnayaka

Senior Engineer, Spotify
I am a Senior Data Infrastructure Engineer in Spotify and a member of the Scio(Idiomatic Scala wrapper to Apache Beam) development team. Open source enthusiastic, a contributor to many open-source projects, and an Apache PMC member. A couple of years ago, I started to work with Scala... Read More →


Thursday November 12, 2020 8:45am - 9:15am PST
functional

9:30am PST

Scaling Databricks to Run Data and AI Workloads on Millions of VMs
Cloud service developers need to handle massive scale workloads from thousands of customers with no downtime or regressions. In this talk, I’ll present our experience building a very large-scale cloud service at Databricks, which provides a data and ML platform service used by many of the largest enterprises in the world. Databricks manages millions of cloud VMs that process exabytes of data per day for interactive, streaming and batch production applications. This means that our control plane has to handle a wide range of workload patterns and cloud issues such as outages. We will describe how we built our control plane for Databricks using Scala services and open source infrastructure such as Kubernetes, Envoy, and Prometheus, and various design patterns and engineering processes that we learned along the way. In addition, I’ll describe how we have adapted data analytics systems themselves to improve reliability and manageability in the cloud, such as creating an ACID storage system that is as reliable as the underlying cloud object store (Delta Lake) and adding autoscaling and auto-shutdown features for Apache Spark.

Speakers
avatar for Matei Zaharia

Matei Zaharia

Chief Technologist, Databricks
Matei Zaharia is an Assistant Professor of Computer Science at Stanford and Co-founder and Chief Technologist at Databricks. He started the Apache Spark project during his PhD at UC Berkeley, and has worked on other widely used open source data analytics and AI software including... Read More →


Thursday November 12, 2020 9:30am - 10:30am PST
cloud

10:45am PST

Acting Lessons for Scala Engineers with Akka and ZIO
Actors are a cornerstone of modern concurrent and distributed systems, propelling both Erlang and Scala to early success. Exposing a model that doesn’t require locks, and which doesn’t block threads, actors have helped developers build software without worrying about deadlocks, thread management, and other woes of modern software development. In this presentation, Salar Rahmanian will provide an introduction to actors, akka, zio-actors and show you how modern functional effect systems like ZIO allow a reimagining of the actor paradigm, in ways that increase type safety, allow coordination of state changes across multiple actors, and still preserve the benefits that brought developers to the actor paradigm.

Speakers
avatar for Salar Rahmanian

Salar Rahmanian

Software Engineer, Collective Health
I have been developing software since the age of eleven and have over 20 years of commercial experience. My passion and expertise is focused on functional programming and building concurrent and distributed systems using Scala. I am a core developer for the ZIO Scala Library for asynchronous... Read More →


Thursday November 12, 2020 10:45am - 11:15am PST
cloud

10:45am PST

Reproducible Data Pipelines Using Controlled Effects
As the data science and machine learning fields have grown over the past decade, so has the number of data pipelining frameworks which allow users to chain together a DAG of tasks. The types of tasks that one can define are nearly endless. They can range from performing pure computations like calculating the average over a window in a stream of data to performing impure actions such as writing files or executing a database query. While existing data pipelining libraries are powerful, they often suffer from a lack of reproducibility. How can one guarantee that a pipeline is reproducible when it is able to execute arbitrary side effects?

In this talk, we present a new architecture for data pipelining frameworks which promotes reproducibility along with an implementation in Haskell, kernmantle. We discuss how this architecture allows you to explicitly control which effects a pipeline is allowed to execute and how they are executed. Such effects may include file and network access, random number generation, parallel execution, and more. This framework also opens the door for config-time interpretation, which allows a pipeline to be analyzed at load-time, prior to its execution.

Speakers
avatar for Yves Parès

Yves Parès

Software Engineer, Tweag
I'm Yves Parès, a 30 y.o. software engineer, and I've been on a strict FP-only diet for about 10 years, to the point I now have trouble processing imperative programming.My go-to meal is some closures as a starter, a big stack of monad as the main course and usually a monoid for... Read More →


Thursday November 12, 2020 10:45am - 11:15am PST
functional

11:30am PST

When the Only Way to Scale up, is to Scale Down
Big Data has long been hailed as the ultimate path to a glorious AI future, and getting more data has now been the one answer given for almost a decade by virtually all data scientists asked how they planned to improve their model's performance. Yet, with always larger datasets to work with, we might reach a point where building better models might call for a total change in paradigm. Even though data storage and compute have been cheap and easily accessible, are we geared towards a world where this is not true anymore? And what happens to Machine Learning then? Once we acknowledge that more and more companies are seeing their AI ROI melt away year after year because of the humongous associated operational and technological costs, this might not sound like Science Fiction anymore...

Speakers
avatar for Jennifer Prendki

Jennifer Prendki

Founder & CEO, Alectio
Jennifer is the founder and CEO of Alectio, the first startup focused on automated data curation and data collection optimization. She and her team are on a mission to help ML teams build models with less data. Prior to Alectio, Jennifer was the VP of Machine Learning at Figure Eight... Read More →


Thursday November 12, 2020 11:30am - 12:00pm PST
cloud

11:30am PST

TypeScript - You'll Like It
TypeScript is a popular language for Front-End development and building cloud infrastructure. In this talk I'll give a Scala-friendly overview of the language and several examples of how adding TypeScript familiarity will allow you to build modern websites and infrastructure while retaining all the functional programming and concise syntax features you enjoy with languages commonly used for services and applications.

Speakers
avatar for Jason Swartz

Jason Swartz

Edge EM, Twitch


Thursday November 12, 2020 11:30am - 12:00pm PST
functional

12:00pm PST

Lunch break
Mingle in the hallway track!

Thursday November 12, 2020 12:00pm - 12:30pm PST
cloud

12:30pm PST

Will AI Kill Programming?
WIth the rise of Data Science and AI, most apps are now data pipelines feeding AI decision-making.
Will programming of those data pipelines become a commodity like plumbing, only to be replaced with "nocode" that the data scientists will operate with NLP UIs?

On the other hand, as AI in production actually means model deployment, and data scientists learn thebest practices of software engineering, including test-driven development, continuous delivery, and cloud-native workflows, the lines become ever more blurred.

Will AI become a kind of programming? Will one kill and subsume the other? Our veteran debate team will fight for you to get the answers!

Vitaly Gordon, the moderator, had lead two previous years' debates to resounding victories -- first, FP in ML, and then, Will it all be Serverless in 10 years?  And now we take on AI and Programming, the two pillars on which it all stands.  You cannot miss this panel.

Moderators
avatar for Vitaly Gordon

Vitaly Gordon

CEO, Faros AI

Speakers
avatar for Helena Edelson

Helena Edelson

CEO, The Axis Initiative
Helena is using AI and complex adaptive systems to study and help endangered species under climate change, biodiversity loss, human-wildlife conflict and illegal wildlife trade. Bridging academia and industry, she is a member of the Environmental Intelligence team of the Interagency... Read More →
avatar for Omar Alonso

Omar Alonso

Tech Lead, Instacart
Omar is a Tech Lead at Instacart where he works on the intersection of information retrieval, knowledge graphs, and human computation. 
avatar for Bryan Cantrill

Bryan Cantrill

CTO, Oxide Computer
avatar for Gene Linetsky

Gene Linetsky

CTO, True Accord


Thursday November 12, 2020 12:30pm - 1:30pm PST
cloud

2:00pm PST

Were Microservices a Huge Mistake?
Were microservices a huge mistake?
Nick Schrock, the co-creator of GraphQL, posited that there will be a multi-billion dollar industry to clean up the mess that microservices will leave us with, and it will take years.

The backlash against the microservices is real. The promise to decompose a monolith turned into a nightmare of observability, async debugging, and stretching your CAP mojo to the limits.
But microservices stand tall and proud where they took hold with experienced teams. They solve the problems of those companies who could no longer maintain their monoliths. They allow remote work and create new flexibility in managing distributed teams building distributed systems.

What will happen to microservices idea going forward?  Our panel, with decades of wisdom between them, will find the answer together -- in a fierce debate where the truth is born!

Moderators
avatar for Alexy Khrabrov

Alexy Khrabrov

Program Chair, Reactive Summit
avatar for William Morgan

William Morgan

CEO, Buoyant
William Morgan is the CEO of Buoyant. Prior to founding Buoyant, he was an infrastructure engineer at Twitter, where he ran several teams building on product-facing backend infrastructure. He has worked at Powerset, Microsoft, adap.tv, and MITRE Corp, and has been contributing to... Read More →

Speakers
avatar for Pete Hunt

Pete Hunt

Software Engineer, Twitter
Software engineer at Twitter. Formerly CEO / cofounder at Smyte, and manager / engineer at Facebook. Early React.js team member.
avatar for Kikia Carter

Kikia Carter

Software Engineer, Lightbend
Kiki Carter is a Principal Enterprise Architect at Lightbend, Inc. She has a passion for enterprise transformations & innovative solutions using emerging technologies to modernize heritage environments. She finds joy developing large scale, high performance distributed systems using... Read More →
avatar for Chris Richardson

Chris Richardson

Founder, Eventuate
Chris Richardson is a developer and architect. He is a Java Champion, a JavaOne rock star and the author of POJOs in Action, which describes how to build enterprise Java applications with frameworks such as Spring and Hibernate. Chris was also the founder of the original CloudFoundry.com... Read More →
avatar for Nick Schrock

Nick Schrock

Founder, Elementl
Nick is the founder/CEO of Elementl and the creator of Dagster (http://dagster.io) the data orchestrator for machine learning, analytics, and ETL. Prior to founding Elementl Nick was a principal engineer and director at Facebook and created GraphQL.


Thursday November 12, 2020 2:00pm - 3:00pm PST
cloud

3:15pm PST

Apache Spark meets FIPS standard
As developer community in enterprise companies like, health care, finance, telcom, retail, and manufacturing, are considering Apache Spark for their data processing, they are also looking for solutions to harden their enterprise security, being FIPS compliant comes first to the mind (FIPS are the Federal Information Processing Standards). In principle, this talk can will help any java based application to meet FIPS compliance. Cloud HSM and Keep your own key, take enterprise security to another level. For example, why do some enterprises prefer to build their own solutions rather than license or buy it from others. We will briefly cover some relevant detail about these. Talk will cover use cases of the need for security compliance, FIPS standard and even beyond it. Some practical guidance on how to be security aware while programming with Apache Spark on a FIPS compliant environment. And will also cover some challenges in setting up the environment itself, depending on the time slot given.

Speakers
avatar for Prashant Sharma

Prashant Sharma

System Software Engineer, IBM
Open source contributor, part of the CODAIT (Center for Open Source Dataand AI Technologies) group at IBM. Apache Spark committer and PMC member.


Thursday November 12, 2020 3:15pm - 3:45pm PST
cloud

3:15pm PST

Query Planning in GraphQL
For a given GraphQL query there could be multiple ways to execute it with varying performance. Query plan contains steps to execute the query in the most efficient way. It allows to analyze query execution cost before it is executed, visualize it for the client and even allow the client to modify the plan according to his or her knowledge of the query object model and its semantics.

Speakers
avatar for Greg Kesler

Greg Kesler

Principal Sofware Engineer, Intuit
Greg is an tech leader at Intuit Developer’s Group focusing on building APIs, SDKs and tools for Intuit and partner developers. Since Intuit has started its journey to decompose monoliths to the micro services, Greg has been a game changer to help the company to build request orchestration... Read More →


Thursday November 12, 2020 3:15pm - 3:45pm PST
functional

3:45pm PST

afternoon break
Mingle in the hallway track!

Thursday November 12, 2020 3:45pm - 4:15pm PST
hallway lunch https://spatial.chat/s/KonfyCare

4:15pm PST

Ray: A System for High-performance, Distributed Machine Learning Applications
Ray is an open-source, distributed framework from U.C. Berkeley’s RISELab that easily scales Python applications from a laptop to a cluster. While broadly applicable, it was developed to solve the unique performance challenges of ML/AI systems, such as the heterogeneous tasks and state management required for reinforcement learning (RL), everything from training neural networks to running simulators. Ray is now used in many production deployments. I'll explain the problems that Ray solves for cluster-wide scaling of general Python applications with specific examples from RLlib, a Ray-based RL library. We’ll see that Ray’s features include rapid scheduling and execution of “tasks” for RL, management of distributed model state (parameters), and an intuitive API. You’ll also learn how and when to use Ray in your projects.

Speakers
avatar for Dean Wampler

Dean Wampler

Head of Developer Relations, Domino Data Lab
Principal Software Engineer at Domino Data Lab, where I work on various aspects of the Domino platform for data scientists. Author of O'Reilly's "Programming Scala", which has a third edition forthcoming with coverage of Scala 3.


Thursday November 12, 2020 4:15pm - 4:45pm PST
cloud

4:15pm PST

Applicative: The Origin Story
We've all seen many Monad tutorials, and maybe a few on Functors, but Applicative is lesser known, which is a shame because it has some super powers. In this talk I'll tell the origin story of applicative, and show that once fully understood, it is a very powerful tool in your functional programming toolkit. The talk begins with a quick introduction to pure functional programming with effects, for beginners in the audience. Next we'll look at the original paper in which applicative programming with effects is described. The code will be translated from the original Haskell to Scala, and we can see we can solve problems first with Monads, then more flexibly and elegantly with Applicative. Finally, we'll look at how Applicative plays its part in the much lauded traverse function.

Speakers
avatar for Justin Heyes-Jones

Justin Heyes-Jones

Software Developer, Yoppworks
Justin started his career making video games and has shipped over 20 titles, working at EA, Activision and Sony amongst others. These days he works at consulting company Yoppworks. He loves to spend time learning more about functional programming, working on open source and sharing... Read More →


Thursday November 12, 2020 4:15pm - 4:45pm PST
functional

5:00pm PST

Event Streaming with Kafka Streams
All things change constantly! And dealing with constantly changing data at low latency is pretty hard. It doesn't need to be that way. Kafka has become a de-facto standard for ingesting event-based data and is considered the central nervous system for data in many organizations top of Kafka. It allows you to transform streams, perform analysis, and stateful operations on the incoming data. In this presentation, Viktor will cover:
• A quick intro into Kafka 101.
• What is Kafka Streams.
• How Kafka Streams enables/simplifies event-based processing.

By the end of this presentation, you will understand the basics of Kafka Streams and how you can start using it to implement event streaming applications!


Speakers
avatar for Viktor Gamov

Viktor Gamov

Developer Advocate, Confluent


Thursday November 12, 2020 5:00pm - 5:30pm PST
cloud

5:00pm PST

Functional implementation of Finger Trees in JavaScript
A narrow "subset" of JavaScript is used to fully functionally implement Finger Trees, a magically efficient data structure with fast insertion, search, split. Even if you don't know JavaScript, you'll see how beautiful it is (as are Finger Trees).

Speakers
avatar for Vlad

Vlad

contributor, Patryshev
Software developer with an experience in categories and toposes.Teaching logic and formal methods at Santa Clara University.Working as a data engineer at Salesforce.


Thursday November 12, 2020 5:00pm - 5:30pm PST
functional

5:00pm PST

Enhancing Spark's Power with ZIO
Session is prerecorded!

Combining ZIO/Future with Spark can drastically speed up the performance of your ML projects, as data sources can be fetched in parallel without holding up computation. Obviously ZIO is much better than Future, but it can be challenging to set up. Leo will present some benchmark results, then discuss how his open-source library handles all boilerplate issues so you can easily implement Spark and ZIO in your ML projects!



Speakers
avatar for Leo Benkel

Leo Benkel

Data engineer
Leo Benkel grew up in France and has been living in San Francisco for the last 10 years.He is passionate about functional programming and teaches Scala at Demandbase. He enjoys designing/architecting libraries and software products. Simply, Leo loves exploring where nobody has ever... Read More →



Thursday November 12, 2020 5:00pm - 5:30pm PST
prerecorded

5:45pm PST

Multi-language Runtime Environments for Smart Contract Execution
The recent interest in distributed ledger technologies has spurred the development of a number of new programming languages to meet the challenge of decentralized state maintenance. A well-identified concern with such development has been the sluggish growth in the number of experienced decentralized application (dApp) developers in the space as well as their relative lack of familiarity with how these novel languages behave under adversarial execution environments. In this talk, we will present our realization and subsequent performance analysis of a multi-language runtime for smart contracts in a Scala based blockchain client using the polyglot capabilities of GraalVM. Furthermore, we will explore the security implications inherent in such a setting and our recent work developing a smart contract system for our application-specific blockchain network.

Speakers
avatar for James Aman

James Aman

CTO, Topl
Founder and Chief Technology Officer at Topl, a blockchain startup focused on empowering social impact projects around the world. Previously, served as CTO with Corporate Partners in Education, a startup building a continuous fundraising mechanism for K-12 schools. I received my PhD... Read More →


Thursday November 12, 2020 5:45pm - 6:15pm PST
cloud

5:45pm PST

A tour of String diagrams and Monoidal categories
In this talk I will describe what String diagrams are, how do they relate to Monoidal categories, and present a Scala declarative DSL inspired by such concepts. Two interpreters will be presented: a GraphViz based renderer and a ZIO based execution engine.

Speakers
avatar for Juan Pablo Romero

Juan Pablo Romero

Software Engineer, Apple
I'm a functional programmer with 10 years of industry experience. Originally from Mexico City, I've been living in San Francisco for the past 6 years.One of my passions is understanding the intersection between Math and Programming. Also high on the list of my interests is finding... Read More →


Thursday November 12, 2020 5:45pm - 6:15pm PST
functional
 
Friday, November 13
 

6:45am PST

Second Day Opening Remarks
Begin second and last day of the first global online Scale By the Bay held across two time zones, from the founder and organizer Dr. Alexy Khrabrov and the onsite producer Oli Makhasoeva.

Speakers
avatar for Alexy Khrabrov

Alexy Khrabrov

Program Chair, Reactive Summit
avatar for Margarita

Margarita

CEO, Konfy
My journey as an organizer started with a small scientific conference in Novosibirsk, Russia. Since then I’ve been focusing on building communities around the globe.How do we build caring community? In order to build a safe home, you need a strong construction. We are planning every... Read More →


Friday November 13, 2020 6:45am - 7:00am PST
cloud

7:00am PST

Spreading the JAMstack
Talk about the JAMstack Sildes here https://ss10.dev/jam-talk

Speakers
avatar for Scott Spence

Scott Spence

Web Developer
Father, husband. Web Developer. JAMstack enthusiast. JavaScript, Gatsby, React, GraphQL and Vercel. http://my.pronoun.is/he


Friday November 13, 2020 7:00am - 7:30am PST
cloud

7:00am PST

AIoT: Why now? And How To?
Both IoT and AI are not new, and some have always assumed that one includes the other.
Combining them for real on the industrial level means huge opportunities, but also huge challenges:
Few proven business models (or is this a paradox itself?), cultural challenges and technological complexity.
This talk builds on a couple of real-world examples, and then introduces an AIoT good practices framework.

Speakers
avatar for Dirk Slama

Dirk Slama

VP Co-innovation and IT/IoT Alliances, Bosch
Dirk Slama is Vice President Co-innovation and IT/IoT Alliances at Robert Bosch GmbH. As Conference Chair of the Bosch ConnectedWorld, Dirk helps shaping the IoT strategy of Bosch. As Chairman of the Steering Committee of the Industrial Internet Consortium (IIC), he helps creating... Read More →


Friday November 13, 2020 7:00am - 7:30am PST
data

7:45am PST

How Micro-Service Patterns Change When the Database Is a Scalable API
There are multiple reasons to adopt microservices but often the initial goal is to scale processing and eliminate database bottlenecks. When we talk about data management, there are several strategies/patterns to implement a data layer for microservice architectures. In this talk, we explore how those patterns change if the database is a scalable API that maintains global consistency.

Speakers
avatar for Brecht De Rooms

Brecht De Rooms

Senior Developer Advocate, Fauna
Brecht De Rooms is a senior developer advocate at Fauna. He is a programmer who has worked extensively in IT as a full-stack developer and researcher in both the startup and IT consultancy worlds. It is his mission to shed light on emerging and powerful technologies that make it easier... Read More →


Friday November 13, 2020 7:45am - 8:15am PST
cloud

7:45am PST

Put Your Machine Learning on Autopilot
A typical machine learning workflow consists of many steps including data analysis, feature engineering, model training and model tuning. What if our machine learning platform could perform these tasks for us and generate high-quality model candidates ready for review and deployment? In this session, I will discuss the concept of Automated Machine Learning (AutoML) and how the latest advances in AutoML allow you to put your machine learning models into autopilot mode while maintaining full visibility and control. In a demo, we will also see AutoML in action.

Speakers
avatar for Antje Barth

Antje Barth

Developer Advocate, AI and Machine Learning, AWS
Antje is a Developer Advocate for AI and Machine Learning at Amazon Web Services (AWS) based in DĂĽsseldorf, Germany. She is co-author of the O'Reilly Book, "Data Science on AWS." Antje is also co-founder of the DĂĽsseldorf chapter of Women in Big Data.  Antje frequently speaks at... Read More →


Friday November 13, 2020 7:45am - 8:15am PST
data

8:30am PST

Keynote: Apache Pulsar @ Splunk
The engineering teams within Splunk have been using several technologies Kinesis, SQS, RabbitMQ and Apache Kafka for enterprise wide messaging for the past few years but have recently made the decision to pivot toward Apache Pulsar, migrating both existing use cases and embedding it into new cloud-native service offerings such as the Splunk Data Stream Processor (DSP). In this talk, we will talk in detail about the different dimensions of evaluation that we did with Apache Pulsar before adopting it.

Speakers
avatar for Karthik Ramasamy

Karthik Ramasamy

Senior Director Of Engineering, Splunk
Karthik Ramasamy is a Senior Director of Engineering managing the Pulsar team at Splunk. Before Splunk, he was the co-founder and CEO of Streamlio that focused on building next generation event processing infrastructure using Apache Pulsar and led the acquisition of Streamlio by Splunk... Read More →


Friday November 13, 2020 8:30am - 9:15am PST
cloud

9:30am PST

Keynote: Next-generation frameworks for Large-scale AI
Abstract: The deep-learning revolution has achieved impressive progress through the convergence of data, algorithms, and computing infrastructure. The availability of web-scale labeled data and parallelism of GPUs enabled us to harness the power of neural networks. However, for further progress, we cannot solely rely on bigger models.  We need to reduce our dependence on labeled data, and design algorithms that can incorporate more structure and domain knowledge. Examples include tensors, graphs, physical laws, and simulations. I will describe efficient frameworks that enable developers to easily prototype such models, e.g. Tensorly to incorporate tensorized architectures, NVIDIA Isaac to incorporate physically valid simulations and NVIDIA RAPIDS for end-to-end data analytics. I will then lay out some outstanding problems in this area.

Speakers
avatar for Anima Anandkumar

Anima Anandkumar

Professor, Director of AI, Caltech and NVIDIA
Anima Anandkumar holds dual positions in academia and industry. She is a Bren professor at Caltech CMS department and a director of machine learning research at NVIDIA. At NVIDIA, she is leading the research group that develops next-generation AI algorithms. At Caltech, she is... Read More →


Friday November 13, 2020 9:30am - 10:00am PST
cloud

10:05am PST

Cloudflow: Spark, Flink, and Akka Working together on Kubernetes
Streaming is a huge part of building modern software. The industry is littered with tips, tools, frameworks, and engines designed to help developers solve steaming problems but when push comes there is no one silver bullet. Streaming problems require a variety of tools working together to deliver high-quality solutions. Cloudflow is an open-source project from Lightbend designed to make it simple for developers to use the right tool for the job when building streaming applications. Cloudflow makes it simple to use the best features of Spark, Flink, Akka, and Kafka together running on top of Kubernetes. In this session we will dive into the concepts behind cloudflow and what makes it a unique streaming framework. We will also dive into a cloudflow application, write code, and deploy a complex streaming application live. By the end of the session you will understand how cloudflow helps developers build higher quality streaming applications, with fewer lines of code, and much less frustration.

Speakers
avatar for Nolan Grace

Nolan Grace

Senior Solution Architect, Lightbend
Nolan Grace is a Senior Solution Architect at Lightbend who splits his time between helping customers build reactive software and traveling the world tasting food and wine.  Nolan is both a certified sommelier as well as a certified reactive architect.  Recently Nolan has been spending... Read More →


Friday November 13, 2020 10:05am - 10:35am PST
cloud

10:05am PST

Goku Flow: A Self-Service Data Pipeline Builder
To provide a high customer satisfaction, Workday leverages the operational data (both structured and unstructured data) to optimize services. At Workday, a centralized big data platform is used for descriptive, diagnostic and predictive analytics for all personas. This talk will introduce the challenges that the platform needs to address. Afterwards, we will present the tools used for various analysis tasks. Moreover, an in-house data pipeline platform will be presented, which is used for automation.

Speakers
avatar for Lei Gao

Lei Gao

Sr. Machine Learning Engineer, Workday
Lei Gao a senior machine learning engineer at Workday, who is leading a team to build a data science analytics platform.


Friday November 13, 2020 10:05am - 10:35am PST
data

10:45am PST

8 Lessons Learned from using Kafka with 1000 Scala microservices
Kafka is the bedrock of Wix's distributed microservices system. For the last 5 years we have learned a lot about how to successfully scale our event-driven architecture to roughly 1500 microservices, mostly written in Scala. We’ve managed to achieve higher decoupling and independence for our various services and dev teams that have very different use-cases while maintaining a single uniform infrastructure in place. Our Kafka infrastructure is called Greyhound and was recently completely re-written using ZIO. In this talk you will learn about 8 key decisions and steps you can take in order to safely scale-up your Kafka-based system. These include:
* How to increase dev velocity of event driven style code.
* How to optimize working with Kafka in polyglot setting
* How to support a growing amount of traffic and developers.

Speakers
avatar for Natan Silnitsky

Natan Silnitsky

Backend Infra Developer, Wix.com
Natan Silnitsky is a backend-infra engineer @Wix.com. He is on the Data streaming team in charge of building event driven libraries and tools on top of Kafka and ZIO. Before that he was part of a task force that was responsible for building the next generation CI system at Wix on... Read More →


Friday November 13, 2020 10:45am - 11:15am PST
cloud

10:45am PST

Owned By Statistics: How Kubeflow & MLOps Can Help Secure Your ML Workloads
While machine learning is spreading like wildfire, very little attention has been paid to the ways that it can go wrong when moving from development to production. Even when models work perfectly, they can be attacked and/or degrade quickly if the data changes. Having a well understood MLOps process is necessary for ML security! In this talk, we will demonstrate the common ways machine learning workflows go wrong, how using MLOps pipelines provide reproducibility, validation, versioning/tracking, and safe/compliant deployment. We will also talk about the direction for MLOps as an industry, and how we can use it to move faster, with less risk, than ever before.

Speakers
avatar for David Aronchick

David Aronchick

Program Manager, Azure, Microsoft
David Aronchick leads open source machine learning strategy at Azure. He spends most of his time helping humans convince machines to be smarter. (He’s only moderately successful at this.) Previously, he led product management for Kubernetes on behalf of Google, launched Google Kubernetes... Read More →


Friday November 13, 2020 10:45am - 11:15am PST
data

11:00am PST

Scalac booth: Exploring ZIO Prelude: The game-changer for typeclasses in Scala!
On this talk, we are going to explore how ZIO Prelude provides us an accessible and fun way of writing pure, generic and composable code in Scala, without needing to appeal to the traditional Functor hierarchy. More specifically, we'll explore these uses cases:
- Combining data structures
- Traversing data structures
- Validating data structures
- And... working with the brand-new ZPure!

Friday November 13, 2020 11:00am - 1:00pm PST
hallway lunch https://spatial.chat/s/KonfyCare

11:30am PST

Conquering All Stores with Gimel – A Unified Data Processing Platform
At PayPal, data engineers, analysts, and data scientists work with a variety of data sources (RDBMS, NoSQL, Messaging, Documents, Big Data, Time Series Databases), compute engines (Spark, Flink, Beam, Hive), languages (Scala, Python, SQL) and execution models (stream, batch, interactive) to process petabytes of data. Due to this complex matrix of technologies and thousands of datasets, engineers spend considerable time learning about different data sources, formats, programming models, APIs, optimizations, etc. which impacts time-to-market (TTM). To solve this problem and to make product development more effective, PayPal Data Platforms developed "Gimel", an open source, unified analytics data platform which provides access to any storage through a single unified data API and SQL, which are powered by a centralized data catalog. Join us and learn how to build a unified data platform with Scala and Spark and how to conquer the heterogenous data landscape providing effective solution for many users in the enterprise.

Speakers
avatar for Deepak Chandramouli

Deepak Chandramouli

PayPal
Deepak Chandramouli an Engineering Lead in PayPal’s Enterprise Data Platforms Organization. Deepak currently manages the engineering for products - UDC (Unified Data Catalog) and Gimel.io (Apache Spark based Data Abstraction Layer). Deepak incubated Gimel and helped open source... Read More →
avatar for Anisha Nainani

Anisha Nainani

PayPal
Anisha is a Senior Software Engineer focusing on building Big Data platforms. She has been a core contributor of PayPal’s Unified Analytics Platform – Gimel which provides access to any storage through a single unified data API and SQL, that are powered by a centralized data catalog... Read More →
avatar for Vladimir Bacvanski

Vladimir Bacvanski

Distinguished Architect, PayPal
Dr. Vladimir Bacvanski is a Principal Architect with Strategic Architecture at PayPal. He is the lead architect for Privacy and the lead architect for Developer Experience, which includes variety of tools in the DevOps arena. Before joining PayPal, Vladimir was the CTO and founder... Read More →


Friday November 13, 2020 11:30am - 12:00pm PST
cloud

11:30am PST

A Reinforcement Learning Framework in Scala 3
This talk covers the implementation details of a personal project I've developed to make it easier for engineers to learn reinforcement learning and try out different reinforcement learning approaches. The talk will cover some of the features of Scala 3 that make the library better, teleologically. That is, it will cover how Scala 3 makes it easier to use the reinforcement learning library and easier to learn reinforcement learning from it, because of Scala 3's features. In this talk you will learn the basics of reinforcement learning and how to build on that basic understanding using this library. You will also learn how to use the library to compare different reinforcement methods.

Speakers
avatar for Robert J. Neal

Robert J. Neal

Software Engineer, Twitter
Robert is a software engineer of the functional style who is focused on experimentation systems and decision science. His current preoccupations are philosophy of probability, epistemology of statistics, and reinforcement learning.


Friday November 13, 2020 11:30am - 12:00pm PST
data

12:00pm PST

Lunch break
Mingle in the hallway track!

Friday November 13, 2020 12:00pm - 12:30pm PST
cloud

12:00pm PST

Netflix booth: AMA on Polynote

Friday November 13, 2020 12:00pm - 1:00pm PST
hallway coffee https://spatial.chat/s/KonfyCare

12:30pm PST

Programming Languages in the Era of the Cloud
How do programming languages (PLs) continue to evolve in the age of clouds?
How do we build cloud-native applications?
Is there a great standardization coming, restricting our choices, or will there be a Renaissance and a thousand flowers will bloom?

Moderators
avatar for Evan Chan

Evan Chan

Senior Data Engineer, UrbanLogiq
Evan is currently Senior Data Engineer at UrbanLogiq, where he is using Rust, among other tools, in building robust data platforms to help public servants build better communities. Evan has been a distributed systems / data / software engineer for twenty years. He led a team developing... Read More →
avatar for Wiem Zine Elabidine

Wiem Zine Elabidine

Software Engineer
avatar for Alexy Khrabrov

Alexy Khrabrov

Program Chair, Reactive Summit

Speakers
avatar for Bill Venners

Bill Venners

Principal, Artima
Bill Venners is president of Artima, Inc., publisher of Scala consulting, training, books, and developer tools. He is the lead developer and designer of ScalaTest, an open source testing tool for Scala and Java developers, and Scalactic, a library of utilities related to quality... Read More →
avatar for Baruch Sadogursky

Baruch Sadogursky

Head DevOps Advocacy, JFrog
Baruch Sadogursky (a.k.a JBaruch) is the Head of DevOps Advocacy and a Developer Advocate at JFrog. His passion is speaking about technology. Well, speaking in general, but doing it about technology makes him look smart, and 19 years of hi-tech experience sure helps. When he’s not... Read More →
avatar for RĂşnar Bjarnason

RĂşnar Bjarnason

Cofounder, Unison
My name is RĂşnar. I’m a software engineer in Boston, an author of a book, Functional Programming in Scala, and cofounder of Unison Computing. We're making a distributed programming language called Unison.Talk to me about functional programming, relational database theory, compilers... Read More →
avatar for Jon Pretty

Jon Pretty

Software Engineer, Propensive
avatar for Bryan Cantrill

Bryan Cantrill

CTO, Oxide Computer
avatar for Jaana Dogan

Jaana Dogan

Principal Engineer, Amazon
Jaana Dogan is working on monitoring, observability and performance tools for ECS, EKS, App Runner, Batch, and other container services.


Friday November 13, 2020 12:30pm - 1:30pm PST
cloud

1:45pm PST

Rebuilding Twitter’s public API

Steve Cosenza @scosenza
Software Engineer @ Twitter In August of this year, Twitter launched v2 of our API on top of a new architecture that could more easily scale with the large number of API endpoints we plan to deliver in the future. As part of this design process, we drafted goals around Abstraction, Ownership, and Consistency. With the above goals in mind, we then designed and built a common platform to host all of our new Twitter API endpoints. In this talk, we'll walk through the goals of our API platform, and then discuss how we implemented this API platform using Scala, GraphQL, and the OpenAPI Spec.

Speakers
avatar for Steve Cosenza

Steve Cosenza

Senior Staff Software Engineer, Twitter
@scosenza


Friday November 13, 2020 1:45pm - 2:15pm PST
cloud

1:45pm PST

Automate the boring ML stuff with pipelines
Production machine learning systems fail in unexpected ways. The data shifts and the accuracy decreases, or the preprocessing steps don’t match what the model expects. The model looks great when you look at one metric, but you measure it another way and it looks terrible. Tired of writing boring custom code to fix these problems? You need an automated way to stop these potential failures before they happen. In this talk, we’ll describe and demo automated machine learning pipelines using TensorFlow Extended and Kubeflow Pipelines. These pipelines include steps to validate the data that flows into the pipeline, preprocess the data, kick off a training run, analyze the model in-depth, and push the final model to its serving location. All of the steps are orchestrated using Kubeflow Pipelines, which lets you schedule a new pipeline run and makes sure the components are completed in the correct order. We'll show an example project of setting up a simple ML Pipeline which lets us produce consistent ML models. Using public data, we describe how these automated machine learning pipelines solve the problem of a mismatch between feature engineering and model training. We’ll also show how we can analyze our models in depth to ensure they provide a fair experience to all users.

Speakers
avatar for Hannes Hapke

Hannes Hapke

Senior Machine Learning Engineer, SAP
Hannes Hapke is a senior data scientist for Concur Labs at SAP Concur, where he explores innovative ways to use machine learning to improve the experience of a business traveler. Prior to joining SAP Concur, Hannes solved machine learning infrastructure problems in various industries... Read More →
avatar for Catherine Nelson

Catherine Nelson

Senior Data Scientist, Concur Labs @ SAP Concur
Catherine Nelson is a Senior Data Scientist for Concur Labs at SAP Concur, where she explores innovative ways to use machine learning to improve the experience of a business traveller. Her key focus areas range from ML explainability and model analysis to privacy-preserving ML. She... Read More →


Friday November 13, 2020 1:45pm - 2:15pm PST
data

2:30pm PST

Cloud-Native Apache Spark: why and how to migrate your Spark pipelines to Kubernetes
Apache Spark can run on top of Kubernetes (as opposed to Hadoop YARN or Standalone mode) since Spark versions 2.3 (2018). In the past two years, the support for running Spark on Kubernetes has grown a lot, and a lot of companies have adopted it -- in fact, Spark-on-Kubernetes will be officially considered "production ready" with the upcoming release of Spark 3.1. In this talk, we will go over the main reasons why many companies decide to adopt Spark-on-Kubernetes, and our best practices for making Spark on Kubernetes reliable and performant at scale. No prior knowledge of Spark or Kubernetes is required, but you should expect a technical session heavy with code-examples and real-life tips to help you productionize Spark on Kubernetes.

Speakers
avatar for Jean-Ives Stephan

Jean-Ives Stephan

Co-Founder & CEO, Data Mechanics
JY is the co-founder of Data Mechanics, a cloud-native Spark platform making Spark easy-to-use and cost-effective for data engineers.Their platform is deployed on a Kubernetes cluster inside their customers cloud account (AWS, GCP, and Azure are supported).Prior to Data Mechanics... Read More →


Friday November 13, 2020 2:30pm - 3:00pm PST
cloud

2:30pm PST

Programming machine learning algorithms in hardware, sanely, using Haskell and Rust!
We’re all used to programming software. But what about programming reconfigurable hardware? That’s exactly what we can do with Field Programmable Gate Arrays (FPGAs). Programming hardware opens up a whole new dimension to optimizing performance and resource utilization. However, programming FPGAs is challenging and requires esoteric tooling. We can do better! In this talk, we will show how we can use the Clash language to safely program FPGAs and Rust to correctly use them in a machine learning application. Our first step is converting a functional machine learning program into computer hardware. How do we do that? We use Clash! Clash is a Haskell like language that allows programmers to define hardware structurally. It does this by compiling functional programs into logic gates that are then turned into circuits on the FPGA. Clash includes a dependent type system, which allows Clash to guarantee that the circuits are wired up correctly on the FPGA, leading to fewer errors. We will demonstrate how a simple machine learning algorithm can be sped up by the hardware parallelism afforded to FPGAs, and highlight how Clash’s type system provides compile time guarantees that the hardware circuits are implemented correctly. OK, so we’ve got our FPGA hardware wired up correctly using Clash, but how do we make sure we use it correctly? What if we’re using it to run powerful and dangerous magnets in an imaging scanner? Well, we better do it right. We’ll talk about how we can use Rust to build a safe session interface over our hardware machine learning algorithm that provides important compile-time guarantees on how the hardware is accessed in software. Rust’s borrow checker ensures that we cannot access resources such as FIFO buffers outside of the scope of the hardware they refer to; additionally we also cannot forget to perform important hardware clean up when these resources go out of scope. This application will demonstrate how Rust’s unique type system enables both an ergonomic and compile time-validated interface to the Clash validated FPGA hardware.

Speakers
avatar for Daniel Hensley

Daniel Hensley

Senior Software Engineer
Daniel Hensley is a Senior Software Engineer working on signal processing, image reconstruction algorithms, and related applied math applications. He also works on embedded and hardware-facing code such as that operating medical imaging scanners. In these areas, where performance... Read More →
avatar for Ryan Orendorff

Ryan Orendorff

Research Scientist, Facebook
Ryan Orendorff is a Research Scientist working on novel algorithms for image and data reconstruction using convex optimization and other techniques. On the side, Ryan works on projects related to theorem proving and programming languages.


Friday November 13, 2020 2:30pm - 3:00pm PST
data

2:30pm PST

Packaging & Deployment Options for Scala Applications / Services
Prerecorded session!

There are many ways to package a Scala application for running on a server, Kubernetes cluster, or cloud service, including sbt native packager, Docker, and CNCF Buildpacks. Given those packaging tools there are also many different ways to put together the deployable artifacts. For instance, maybe you want to use GraalVM to transform the service into a native image. Techniques like container image layering can also support efficient image rebuilding. This session will explore these tools and techniques for packaging and deploying Scala applications.

Speakers
avatar for James Ward

James Ward

Developer Advocate, Google Cloud
James Ward is a nerd / software developer who shares what he learns with others though presentations, blogs, demos, and code. After over two decades of professional programming, he is now a self-proclaimed Typed Pure Functional Programming zealot but often compromises on his ideals... Read More →


Friday November 13, 2020 2:30pm - 3:00pm PST
prerecorded

3:15pm PST

Optimizing Latency-Sensitive Queries for Presto at Facebook
For many latency-sensitive SQL workloads, Presto is often bound by retrieving distant data. In this talk, Rohit Jain, James Sun from Facebook and Bin Fan from Alluxio will introduce their teams’ collaboration on adding a local on-SSD Alluxio cache inside Presto workers to improve unsatisfied Presto latency. This talk will focus on: -Insights of the Presto workloads at Facebook w.r.t. cache effectiveness -API and internals of the Alluxio local cache, from design trade-offs (e.g. caching granularity, concurrency level and etc) to performance optimizations. -Initial performance analysis and timeline to deliver this feature for general Presto users. -Discussion on our future work to optimize cache performance with deeper integration with Presto

Speakers
avatar for Rohit Jain

Rohit Jain

Software Engineer, Facebook
Technologist. Specializes in building large scale distributed systems. Loves solving real-world complex problems. 
avatar for Bin Fan

Bin Fan

VP of Open Source, Alluxio
Bin Fan is VP of open source at Alluxio and the PMC maintainer of Alluxio open source. Prior to joining Alluxio as a founding engineer, he worked for Google to build the next-generation storage infrastructure. Bin received his PhD in computer science from Carnegie Mellon University... Read More →


Friday November 13, 2020 3:15pm - 3:45pm PST
cloud

3:15pm PST

Data Science in Scala with ScalaPy
Python is the dominant language for data science today with a plethora of machine learning and scientific computing libraries. Scala, on the other hand, is the dominant language for big data processing. What if we could bring these two worlds together? ScalaPy enables Scala applications to use Python libraries with a seamless interop layer. With support for core Python features including native bindings, ScalaPy can be used anywhere from training neural networks on GPUs with TensorFlow to making astronomical calculations with Astropy. In addition, ScalaPy supports creating type definitions to enable type-safe interactions with Python libraries. In this talk, we’ll explore how ScalaPy works and how it can be used in different applications. We’ll also look at support in environments like Jupyter notebooks and ways to optimize interop performance.

Speakers
avatar for Shadaj Laddad

Shadaj Laddad

Student, UC Berkeley
Shadaj loves exploring the boundaries of programming, usually with Scala for its combination of functional and object-oriented concepts. He is currently a student at UC Berkeley studying Electrical Engineering and Computer Science. He has interned at Google Brain, Facebook, Apollo... Read More →


Friday November 13, 2020 3:15pm - 3:45pm PST
data

3:45pm PST

afternoon break
Mingle in the hallway track!

Friday November 13, 2020 3:45pm - 4:15pm PST
cloud

4:15pm PST

Deploying a Modern Serverless Reactive Container to the Cloud
We have been hearing a lot about the benefits of using the reactive approach to solving concurrency problems in distributed systems. While reactive programming refers to the implementation techniques being used on the coding level, on the systems deployment and runtime level, we can leverage on a robust yet very flexible and lightweight framework such as Vert.x to deliver. In this session, we will first learn about what the missions of a reactive system are, which, among many things, include handling multiple concurrent data stream flows and being able to control back pressure as well as managing errors in an elegant manner. We will also discuss the special polyglot nature of Vert.x, its event loop, and its use of the Vertical model. Live coding will accompany this session to illustrate how to program a simple use case using multiple JVM languages such as Java and Kotlin, and then we will build and dockerize it to be deployed as a serverless container leveraging Knative to a Kubernetes cluster in a delightful manner. With all of the excitement in the cloud native world, and buzz words like reactive systems, event-driven, serverless apps, containers, clusters etc. flying around, this talk will give a very concise introduction to what each of these terms mean, and how they can be strung together to illustrate an end-to-end reactive application from implementation, compilation, containerization, all the way to it being deployed onto the cloud. Participants will learn some important concepts about event-driven and reactive systems, and how the lightweight yet very powerful Eclipse Vert.x can enable them to start building event-driven and reactive applications and microservices easily, as well as how to containerize and deploy them to the cloud in an unintimidated fashion.

Speakers
avatar for Mary Grygleski

Mary Grygleski

Streaming Developer Advocate, DataStax
Mary is a Java Champion and a passionate Senior Developer Advocate at DataStax, a leading data management company that champions Open Source software and specializes in Big Data, DB-as-a-service, Streaming, and Cloud-Native systems. She spent 3.5 years as a very effective advocate... Read More →


Friday November 13, 2020 4:15pm - 4:45pm PST
cloud

4:15pm PST

NLP text recommender system journey to automated training pipeline with Spark and Sagemaker
This talk will cover how we built and productionized automated machine learning pipelines at Salesforce.  Starting with heuristics to automated retraining using technologies including but not limited to Scala, Python, Apache Spark, Docker, Sagemaker for training, and serving. We will walk through the generally applicable data prep, feature engineering, training, evaluation/comparisons, and continuous model training including data feedback loops in containerized environments with AWS Sagemaker. We will talk about our deployment and validation approach. Finally, we’ll draw lessons from iteratively building an enterprise ML product. Attendees will learn about the mental models for building end to end prod ML pipelines and GA ready products.

Speakers
avatar for Aditya Sakhuja

Aditya Sakhuja

Engineering Lead, Salesforce
Aditya Sakhuja is an Engineering Lead at Salesforce Einstein building ML products. He built the early prototype of a question answering system in salesforce's ML journey and helped ship multiple ML products over the next few years in the service and collaboration space including knowledge... Read More →


Friday November 13, 2020 4:15pm - 4:45pm PST
data

5:00pm PST

The art of being resilient. How to handle failures gracefully in Distributed Systems.
You invest your time and effort breaking up that monolithic Frankenstein into a suite of elegant composable micro-services, you containerize them and you deploy them somewhere in the cloud in the form of distributed resources. Then you proudly watch it all come together reaping the benefits of the most scalable architectures. It is all fine and dandy from this point on. Too good to be true? Of course! This session is about what to do when you wake up to find yourself in the weeds diagnosing that first bug or application failures through the convoluted web of distributed systems of your own doing. Through a series of code snippets, we will introduce the most important open-source projects tools to strike the right balance of tooling and best practices to handle failures gracefully and detect them quickly in the world of distributed systems.

Speakers
avatar for Muktesh Mishra

Muktesh Mishra

Staff Software Engineer, DPE, Adobe
Muktesh is currently working as a Staff Software Engineer for Adobe. He is an open-source contributor to 20+ projects and enjoys programming in polyglot. Primarily he is more interested and contributes to Microservices, Cloud Computing, Continuous Delivery, Containerization, Architectures... Read More →


Friday November 13, 2020 5:00pm - 5:30pm PST
cloud

5:00pm PST

State of the art natural language understanding at scale
Natural language understanding is a key component in many data science systems that must understand or reason about text. Common use cases include information extraction, summarization, sentiment analysis, document classification, language modeling, and disambiguation. This talk introduces the Spark NLP library - the world's most widely used NLP library in the enterprise. Accuracy, speed, and scalability benchmarks and design best practices for building NLP, ML and DL pipelines will be shared. The library implements core NLP algorithms including lemmatization, part of speech tagging, dependency parsing, named entity recognition, spell checking, document classification, and sentiment detection. The talk will demonstrate using these algorithms to build commonly used pipelines, using live Python notebooks that will be made publicly available after the talk.

Speakers
avatar for David Talby

David Talby

CTO, John Snow Labs
David Talby is a chief technology officer at John Snow Labs, helping fast-growing companies apply artificial intelligence to solve real-world problems in healthcare & life science. David is the creator of the Spark NLP library and has extensive experience in building and operating... Read More →


Friday November 13, 2020 5:00pm - 5:30pm PST
data

5:45pm PST

Getting Things Done in the Scala REPL
The Scala REPL lets you interactively run Scala code snippets and see the results. This talk will explore how far you can stretch the Scala REPL, performing difficult tasks that you may normally associate with larger efforts. Through these exercises, we will see how much useful work you accomplish with a tiny amount of Scala code.

Speakers
avatar for Li Haoyi

Li Haoyi

Software Engineer, Dropbox
Haoyi is a software engineer at Dropbox who works on Python/Coffeescript during the day and contributes to the Scala open-source ecosystem at night. He is known for his contributions to the Scala.js project, writing a JVM from scratch in 3000LOC, and doin


Friday November 13, 2020 5:45pm - 6:15pm PST
cloud

5:45pm PST

Smokey and the Multi-Armed Bandit featuring BERT Reynolds
Using the popular Transformers library from Hugging Face, I will train and deploy multiple natural language understanding (NLU) models. I will then compare these models in live production using a multi-armed bandit to dynamically shift traffic to the winning model throughout the live experiment.

Speakers
avatar for Chris Fregly

Chris Fregly

Developer Advocate, AI and Machine Learning, AWS


Friday November 13, 2020 5:45pm - 6:15pm PST
data
 
  • Timezone
  • Filter By Date Scale By the Bay 2020 Nov 12 -13, 2020
  • Filter By Venue Online
  • Filter By Type
  • break
  • cloud
  • data
  • functional
  • keynote
  • panel
  • prerecorded


Filter sessions
Apply filters to sessions.