Machine Learning REPA Week 2021

Conference on Machine Learning Engineering, MLOps and Management topics

05 - 11 April 2021

Our goal is

to gather best practices and solutions for Machine Learning engineering and process automation, learn best Open Source tools, and share team collaboration and product management insights from around the various industries and applications.

Main topics

How to organize you project and code?
How to enforce your team collaboration?
How to manage a project growth?

How to build and automate pipelines?
Version control for your code, data and pipelines
Tools and practices in machine learning applications
Reproducibility of machine learning pipelines

How to manage ML experiments and metrics tracking?
What tools to use?
Model Lifecycle and Development process

How to build a production ready solution with a model/pipeline you've developed?
What is MLOps and how to make it work
Build CI/CD for Machine Learning

Testing in Machine Learning
Deploying your solution is not the end of the story!
How to monitor your model works appropriate?
Tools and integrations for monitoring deployed model

Program

April 5

April 6

April 7

April 8

April 9

April 10

April 11

Track 1

ML REPA Week 2021

4:00 pm

Openning Talk

What is the Maturity Model in Data science?

4:20 pm

Yuliya Rubtsova

TOP 5 REASONS WHY YOUR ML PROJECT DIDN'T MAKE IT AND HOW TO GET IT RIGHT THE FIRST TIME

5:00 pm

Irina Kukuyeva, Ph.D.

Track 2

ML REPA Week 2021

4:00 pm

Openning Talk

DVC: data versioning and ML experiments on top of Git

4:20 pm

Dmitry Petrov

ZenML

5:00 pm

Hamza Tahir

Apple ice cream

5:40 pm

Vanilla ice cream with apples and caramel sauce

Track 1

10:00–11:00

User experience design

The purpose of visual design is to use visual elements to convey a message to its audience.

Sarah Lewin, founder of Pic Pen studio

12:00–13:00

Experiential graphic design

Experiential graphic design is the application of communication skills to the build environment.

Alex Larkins, creative director SoSoul magazine

09:15–10:00

Calligraphy workshop

Lettering workshop with guru of pens, brushes, colapens, bamboo dip pens and other cool tools.

Farrel Gulierm, creative director in DD agency

Track 2

08:00–08:30

Self-development in design and revolutions in minds at the begining of 20's

Laura Sanders, creative director in SoSoul magazine

09:30–10:30

Crowdsourcing in graphic design companies

Tasks may be assigned to individuals or a group and may be categorized as convergent or divergent.

Lucas Mikson, projects manager in Pixels

11:00–12:15

Digital illustration

Drawing tools typically create precise lines, shapes and patterns with well-defined edges.

Samuel Willson, projects manager in Pixels

Track 1: Machine Learning project and teamwork organization

Lean Data Science: agile practices for Data Science projects

Askhat Urazbaev, Agile coach, Founder @ LeanDS

In this talk, we will explore collaborative techniques that guide data science teams in their agile adaption. We will discuss how to come up with nice and clear product hypothesis, how to prioritise them using ICE/RICE method, how to decompose huge AI Epics into small and easy to validate data science hypothesis and how to effectively manage work using Kanban and Scrum approaches.
Lean Data Science: agile practices for Data Science projects
How to come up with product hypothesis, prioritise them and how to effectively manage work using Kanban and Scrum approaches.

Wrong but useful: turning ML models into ML products

Elena Samuylova, CEO & Co-founder Evidently AI

When working on machine learning projects, we often focus on technical challenges and building accurate models. However, a model itself is not a product. To solve the business problem at hand, we need to consider a wider set of requirements. In this talk, I will share a set of questions to think through when working on enterprise ML solutions to make sure your models get to production

How become a good (team) manager in Machine Learning?

Alexander Moiseev, Head of Product Analytics, Capital Markets @ Raiffeisenbank

Key goals of a good manager - enhance the team to achieve new heights.
Main tools: encourage expertise and knowledge sharing within the team, improve team's members visibility, develop a culture of continuous learning.

This talk focuses on simple best tips and tricks for newcomers to a manager path. We will not speak about common practices. The talk will be focused on worst mistakes that all new managers do and on good and simple things that help everyone become a good manager in a short term

Architecture of Machine Learning systems

Michael Perlin, Machine Learning Engineer @ Volkswagen

A happy moment: ML model leaves the notebook to start benefiting the business. DS faces the question of how to integrate it: the possibilities are usually many, many different decisions have to be made, and it is often unclear how to approach them.
Software architecture is the discipline that is responsible for this. What does it involve? What skills and qualities does it require? Can DS master it? Who to call for help ?
This talk is less about technology and more about processes, strategies and people.

AI development process: Common mistakes

Kseniia Melnikova, Product Owner(Data/AI) @ SoftwareOne

Let's talk about the main problems and common mistakes of the AI dev process!

In this talk I will cover full development stage of the ML models' building: data preparation and data version control, code versions control, metrics tracking and hyperparameters tuning. We will discuss the weak points for each component and possible mistakes that you and your team are probably making. I will provide you with the solutions and AI tools list, which will help you to cover your process fully in more sophisticated way.

Structuring machine learning projects

Yerzat Marat, Project manager @ Knowtions Research

In this talk, we will cover a set of tools and principles that the speaker found useful with practical case breakdowns

The 3 components your "Agile AI" product development stack should include

Ashley Beattie, Head of DevOps Transformation @ Agile By Design

What tools and techniques should I have in my toolbox when it comes to delivering AI products?
1. What are the goals of an AI Product development system?
2. Creating Valuable AI Products: the techniques, tools and approaches to understand, design, develop and test an AI opportunity
3. Rapidly testing AI Product hypothesis: how to slice your opportunity to define an MVP that matters

5 Principles of LeanML

Laszlo Sragner, Founder @ Hypergolic

In this talk, I will introduce LeanML in 5 key takeaways. Lean Machine Learning (LeanML) is a framework that enables Data Science teams to build business-oriented data products through a deliberate process. LeanML was created to address the difficulties of dealing with data-centric workflows and outcomes. It is inspired by techniques and knowhow from disciplines in quant trading, business intelligence, agile software engineering and strategic consulting.

Top 5 Reasons why your ML project didn't make it and how to get it right the first time

Irina Kukuyeva, Ph.D.

Every Data Scientist/ML Engineer is hired to bring value to the business and is expected to develop and iterate on data products that help the company grow. But not every data analytics project is a data product. This talk, based on 25+ collaborations with companies of all industries and sizes, will cover 5 of the most common reasons for what's necessary to upgrade your data analytics project into a data product. You will learn:

How to better collaborate with your stakeholders
What to ask before the project begins
What to watch out for as you're developing data products
What software requirements you should be aware of
What resources you need to have

By the end of the session, the audience will have a better understanding of the technical and organizational considerations for iterating on data initiatives, and walk away with practical advice for how to help your company get return on data investment and make it more data-driven.

Track 2: ML pipelines automation. Code and Data version control. Reproducibility

DVC: data versioning and ML experiments on top of Git

Dmitry Petrov, Creator of DVC - Data Version Control - Git for machine learning. Now co-founder & CEO of Iterative.ai. Ex-Data Scientist at Microsoft. PhD in Computer Science.

ML practitioners rapidly experiment to optimize for the best results or analyze
different subsets of data. Experiments need to be reproducible, both to recover
and tweak experiments and to instill confidence in the final results.
Reproducibility across experiments becomes more difficult as the data size and
project complexity increase. The data and code to generate each experiment must
be tracked, and running the entire pipeline from scratch may be infeasible.

With the open-source tool DVC, hundreds of experiments can be tracked
automatically. DVC tracks data, code, and metrics together, keeping the code and
metadata in a Git repository while caching the data anywhere the user chooses.
This approach scales with large data and complex projects, ensuring fully
reproducible results that make experimentation efficient and easy.

Building ML Pipelines with Dagster:
The role of the orchestrator in machine learning

Sandy Ryza, Software Engineer at Elementl, working on Dagster.

There would be no machine learning models without features, and there would be no features without data pipelines. Orchestrators help data scientists and ML engineers assemble durable pipelines out of the data transformations that define their features. Dagster is an orchestrator that puts data at the center. While orchestrators typically focus on sequencing computations in production, Dagster brings orchestration to the entire ML development lifecycle. It helps engineers and data scientists answer questions like:

How will a change to how I model my data affect the performance of my ML model?
What data and code were used to train this model?
How can I test my ML pipelines?
How can I try out changes to my feature without messing up my production data?

Evolutionary automation of ML pipelines with FEDOT Framework

Nikolay Nikitin, Senior Research Fellow
@ National Center for Cognitive Technologies, ITMO University

I plan to talk about the AutoML solutions for classification, regression, clustering, and time series forecasting implemented in open-source FEDOT Framework (https://github.com/nccr-itmo/FEDOT).
The framework allows building the modeling pipelines with the heterogeneous structure that can consist of blocks of different types (for example, ML-models, equation-based models, NLP models, neural networks, data preprocessing blocks, and even atomized pipelines) and have the multiscale or multimodal nature (for example, a model predicting different components of time series separately can be built automatically for a time series forecasting task). Also, the framework makes it possible to "export" the obtained model and data in order to improve the reproducibility of the AutoML-based experiments

Track 4: CI/CD and MLOps in ML. Testing in ML

Automating Machine Learning with GitHub Actions & GitLab CI

Elle O'Brien, Lecturer @University of Michigan School of Information
Data Scientist @ Iterative, Inc

Machine learning is maturing as a discipline: now that it's trivially easy to create and train models, it's never been more challenging to manage the complexity of experiments, changing datasets, and the demands of a full-stack project. In this talk, we'll examine why one of the staples of DevOps, continuous integration, has been so challenging to implement in ML projects so far and how it can be done using open-source tools like Git, GitHub Actions, and DVC (Data Version Control).
We'll also discuss a new open source project (Continuous Machine Learning) created to adapt popular continuous integration systems like GitHub Actions and GitLab CI to data science projects. We'll cover example use cases, including automated model testing in a standardized environment, getting detailed reporting on model behavior in a pull request, and training models on cloud GPUs.

Tutorials

Develop End To End Scalable ML Pipeline With Kubeflow

Ritaban Chowdhury, Machine Learning Engineer @ RiiidLabs

I am going to talk about how to productionalize ML product using Kubeflow

85% of ML models are not used in production settings. In Industry if you do not test something on production settings it does not generate value. Kubeflow is a platform which makes prototyping ML models easier and salable. We will learn the very basics of this platform.

Evolutionary automation of ML pipelines with FEDOT Framework

Nikolay Nikitin, Senior Research Fellow
@ National Center for Cognitive Technologies, ITMO University

I plan to talk about the AutoML solutions for classification, regression, clustering, and time series forecasting implemented in open-source FEDOT Framework (https://github.com/nccr-itmo/FEDOT).
The framework allows building the modeling pipelines with the heterogeneous structure that can consist of blocks of different types (for example, ML-models, equation-based models, NLP models, neural networks, data preprocessing blocks, and even atomized pipelines) and have the multiscale or multimodal nature (for example, a model predicting different components of time series separately can be built automatically for a time series forecasting task). Also, the framework makes it possible to "export" the obtained model and data in order to improve the reproducibility of the AutoML-based experiments

Better Code Quality for Data Science

Julia Antokhina, DS at Mobile TeleSystems

Tutorials to make your code better for Data Science

Reproducibility of ML solutions in seismic interpretation project

Alexey Kozhevin, Data Scientist at Gazprom-Neft.
Solving seismic interpretation tasks with neural networks

Reproducibility of ML solutions in seismic interpretation is important at all stages of work: from data loading and deploying a production environment to model training and metrics evaluation.

We will show what open-source tools our team have developed and how we organized the full cycle of work on the project using the example of fault detection on seismic data.

Kubeflow pipelines for Object detection models on the edge

Imad Eddine Ibrahim, BEKKOUCH
Provectus, Data Scientist
PhD at Sorbonne University Paris(attending)

This workshop will start with a small presentation of object detection models and which ones are most suitable for running on edge devices with real time inference speed. The next step is to configure and run a kubeflow pipeline locally and configure the hyper-parameters used for model training. Last is to see the results of several experiments and compare the best models

Registration

We are going to use ML REPA School platform for organize our conference Online. Please, register and book your place on Machine Learning REPA Week 2021!

Register me

Organizers

Our partners

email: info@ml-repa.ru
telegram: t.me/mlrepa

See you on the ML REPA Week 2021!