Machine Learning REPA Week 2021

Conference on Machine Learning Engineering, MLOps and Management topics

05 - 11 April 2021
Our goal is

to gather best practices and solutions for Machine Learning engineering and process automation, learn best Open Source tools, and share team collaboration and product management insights from around the various industries and applications.

Main topics
Track 1: Machine Learning project and teamwork organization
  • How to organize you project and code?
  • How to enforce your team collaboration?
  • How to manage a project growth?
Track 2: ML pipelines automation. Code and Data version control. Reproducibility
  • How to build and automate pipelines?
  • Version control for your code, data and pipelines
  • Tools and practices in machine learning applications
  • Reproducibility of machine learning pipelines
Track 3: ML experiments management and metrics tracking. Model Lifecycle Management
  • How to manage ML experiments and metrics tracking?
  • What tools to use?
  • Model Lifecycle and Development process
Track 4: CI/CD and MLOps in ML
  • How to build a production ready solution with a model/pipeline you've developed?
  • What is MLOps and how to make it work
  • Build CI/CD for Machine Learning
Track 5: Testing and Monitoring in Machine Learning
  • Testing in Machine Learning
  • Deploying your solution is not the end of the story!
  • How to monitor your model works appropriate?
  • Tools and integrations for monitoring deployed model
Program
April 5
April 6
April 7
April 8
April 9
April 10
April 11
Track 1
ML REPA Week 2021
4:00 pm
Openning Talk
What is the Maturity Model in Data science?
4:20 pm
Yuliya Rubtsova
TOP 5 REASONS WHY YOUR ML PROJECT DIDN'T MAKE IT AND HOW TO GET IT RIGHT THE FIRST TIME
5:00 pm
Irina Kukuyeva, Ph.D.
Track 2
ML REPA Week 2021
4:00 pm
Openning Talk
DVC: data versioning and ML experiments on top of Git
4:20 pm
Dmitry Petrov
ZenML
5:00 pm
Hamza Tahir
Apple ice cream
5:40 pm
Vanilla ice cream with apples and caramel sauce
Track 1
Sarah Lewin, founder of Pic Pen studio
Alex Larkins, creative director SoSoul magazine
Farrel Gulierm, creative director in DD agency
Track 2
Laura Sanders, creative director in SoSoul magazine
Lucas Mikson, projects manager in Pixels
Samuel Willson, projects manager in Pixels
Track 1: Machine Learning project and teamwork organization
Lean Data Science: agile practices for Data Science projects
Askhat Urazbaev, Agile coach, Founder @ LeanDS
In this talk, we will explore collaborative techniques that guide data science teams in their agile adaption. We will discuss how to come up with nice and clear product hypothesis, how to prioritise them using ICE/RICE method, how to decompose huge AI Epics into small and easy to validate data science hypothesis and how to effectively manage work using Kanban and Scrum approaches.
Lean Data Science: agile practices for Data Science projects
How to come up with product hypothesis, prioritise them and how to effectively manage work using Kanban and Scrum approaches.
Wrong but useful: turning ML models into ML products
Elena Samuylova, CEO & Co-founder Evidently AI
When working on machine learning projects, we often focus on technical challenges and building accurate models. However, a model itself is not a product. To solve the business problem at hand, we need to consider a wider set of requirements. In this talk, I will share a set of questions to think through when working on enterprise ML solutions to make sure your models get to production
    How become a good (team) manager in Machine Learning?
    Alexander Moiseev, Head of Product Analytics, Capital Markets @ Raiffeisenbank
    Key goals of a good manager - enhance the team to achieve new heights.
    Main tools: encourage expertise and knowledge sharing within the team, improve team's members visibility, develop a culture of continuous learning.

    This talk focuses on simple best tips and tricks for newcomers to a manager path. We will not speak about common practices. The talk will be focused on worst mistakes that all new managers do and on good and simple things that help everyone become a good manager in a short term
    Architecture of Machine Learning systems
    Michael Perlin, Machine Learning Engineer @ Volkswagen
    A happy moment: ML model leaves the notebook to start benefiting the business. DS faces the question of how to integrate it: the possibilities are usually many, many different decisions have to be made, and it is often unclear how to approach them.
    Software architecture is the discipline that is responsible for this. What does it involve? What skills and qualities does it require? Can DS master it? Who to call for help ?
    This talk is less about technology and more about processes, strategies and people.
    AI development process: Common mistakes
    Kseniia Melnikova, Product Owner(Data/AI) @ SoftwareOne
    Let's talk about the main problems and common mistakes of the AI dev process!

    In this talk I will cover full development stage of the ML models' building: data preparation and data version control, code versions control, metrics tracking and hyperparameters tuning. We will discuss the weak points for each component and possible mistakes that you and your team are probably making. I will provide you with the solutions and AI tools list, which will help you to cover your process fully in more sophisticated way.
    Structuring machine learning projects
    Yerzat Marat, Project manager @ Knowtions Research
    In this talk, we will cover a set of tools and principles that the speaker found useful with practical case breakdowns
    The 3 components your "Agile AI" product development stack should include
    Ashley Beattie, Head of DevOps Transformation @ Agile By Design
    What tools and techniques should I have in my toolbox when it comes to delivering AI products?
    1. What are the goals of an AI Product development system?
    2. Creating Valuable AI Products: the techniques, tools and approaches to understand, design, develop and test an AI opportunity
    3. Rapidly testing AI Product hypothesis: how to slice your opportunity to define an MVP that matters
    5 Principles of LeanML
    Laszlo Sragner, Founder @ Hypergolic
    In this talk, I will introduce LeanML in 5 key takeaways. Lean Machine Learning (LeanML) is a framework that enables Data Science teams to build business-oriented data products through a deliberate process. LeanML was created to address the difficulties of dealing with data-centric workflows and outcomes. It is inspired by techniques and knowhow from disciplines in quant trading, business intelligence, agile software engineering and strategic consulting.
    Top 5 Reasons why your ML project didn't make it and how to get it right the first time
    Irina Kukuyeva, Ph.D.
    Every Data Scientist/ML Engineer is hired to bring value to the business and is expected to develop and iterate on data products that help the company grow. But not every data analytics project is a data product. This talk, based on 25+ collaborations with companies of all industries and sizes, will cover 5 of the most common reasons for what's necessary to upgrade your data analytics project into a data product. You will learn:

    • How to better collaborate with your stakeholders
    • What to ask before the project begins
    • What to watch out for as you're developing data products
    • What software requirements you should be aware of
    • What resources you need to have
    By the end of the session, the audience will have a better understanding of the technical and organizational considerations for iterating on data initiatives, and walk away with practical advice for how to help your company get return on data investment and make it more data-driven.
    Track 2: ML pipelines automation. Code and Data version control. Reproducibility
    DVC: data versioning and ML experiments on top of Git
    Dmitry Petrov, Creator of DVC - Data Version Control - Git for machine learning. Now co-founder & CEO of Iterative.ai. Ex-Data Scientist at Microsoft. PhD in Computer Science.
    ML practitioners rapidly experiment to optimize for the best results or analyze
    different subsets of data. Experiments need to be reproducible, both to recover
    and tweak experiments and to instill confidence in the final results.
    Reproducibility across experiments becomes more difficult as the data size and
    project complexity increase. The data and code to generate each experiment must
    be tracked, and running the entire pipeline from scratch may be infeasible.

    With the open-source tool DVC, hundreds of experiments can be tracked
    automatically. DVC tracks data, code, and metrics together, keeping the code and
    metadata in a Git repository while caching the data anywhere the user chooses.
    This approach scales with large data and complex projects, ensuring fully
    reproducible results that make experimentation efficient and easy.
    Building ML Pipelines with Dagster:
    The role of the orchestrator in machine learning
    Sandy Ryza, Software Engineer at Elementl, working on Dagster.
    There would be no machine learning models without features, and there would be no features without data pipelines. Orchestrators help data scientists and ML engineers assemble durable pipelines out of the data transformations that define their features. Dagster is an orchestrator that puts data at the center. While orchestrators typically focus on sequencing computations in production, Dagster brings orchestration to the entire ML development lifecycle. It helps engineers and data scientists answer questions like:

    • How will a change to how I model my data affect the performance of my ML model?
    • What data and code were used to train this model?
    • How can I test my ML pipelines?
    • How can I try out changes to my feature without messing up my production data?
    Evolutionary automation of ML pipelines with FEDOT Framework
    Nikolay Nikitin, Senior Research Fellow
    @ National Center for Cognitive Technologies, ITMO University
    I plan to talk about the AutoML solutions for classification, regression, clustering, and time series forecasting implemented in open-source FEDOT Framework (https://github.com/nccr-itmo/FEDOT).
    The framework allows building the modeling pipelines with the heterogeneous structure that can consist of blocks of different types (for example, ML-models, equation-based models, NLP models, neural networks, data preprocessing blocks, and even atomized pipelines) and have the multiscale or multimodal nature (for example, a model predicting different components of time series separately can be built automatically for a time series forecasting task). Also, the framework makes it possible to "export" the obtained model and data in order to improve the reproducibility of the AutoML-based experiments
    Track 4: CI/CD and MLOps in ML. Testing in ML
    Automating Machine Learning with GitHub Actions & GitLab CI
    Elle O'Brien, Lecturer @University of Michigan School of Information
    Data Scientist @ Iterative, Inc
    Machine learning is maturing as a discipline: now that it's trivially easy to create and train models, it's never been more challenging to manage the complexity of experiments, changing datasets, and the demands of a full-stack project. In this talk, we'll examine why one of the staples of DevOps, continuous integration, has been so challenging to implement in ML projects so far and how it can be done using open-source tools like Git, GitHub Actions, and DVC (Data Version Control).
    We'll also discuss a new open source project (Continuous Machine Learning) created to adapt popular continuous integration systems like GitHub Actions and GitLab CI to data science projects. We'll cover example use cases, including automated model testing in a standardized environment, getting detailed reporting on model behavior in a pull request, and training models on cloud GPUs.
    Tutorials
    Develop End To End Scalable ML Pipeline With Kubeflow
    Ritaban Chowdhury, Machine Learning Engineer @ RiiidLabs
    I am going to talk about how to productionalize ML product using Kubeflow

    85% of ML models are not used in production settings. In Industry if you do not test something on production settings it does not generate value. Kubeflow is a platform which makes prototyping ML models easier and salable. We will learn the very basics of this platform.
    Evolutionary automation of ML pipelines with FEDOT Framework
    Nikolay Nikitin, Senior Research Fellow
    @ National Center for Cognitive Technologies, ITMO University
    I plan to talk about the AutoML solutions for classification, regression, clustering, and time series forecasting implemented in open-source FEDOT Framework (https://github.com/nccr-itmo/FEDOT).
    The framework allows building the modeling pipelines with the heterogeneous structure that can consist of blocks of different types (for example, ML-models, equation-based models, NLP models, neural networks, data preprocessing blocks, and even atomized pipelines) and have the multiscale or multimodal nature (for example, a model predicting different components of time series separately can be built automatically for a time series forecasting task). Also, the framework makes it possible to "export" the obtained model and data in order to improve the reproducibility of the AutoML-based experiments
    Better Code Quality for Data Science
    Julia Antokhina, DS at Mobile TeleSystems
    Tutorials to make your code better for Data Science
    Reproducibility of ML solutions in seismic interpretation project
    Alexey Kozhevin, Data Scientist at Gazprom-Neft.
    Solving seismic interpretation tasks with neural networks
    Reproducibility of ML solutions in seismic interpretation is important at all stages of work: from data loading and deploying a production environment to model training and metrics evaluation.

    We will show what open-source tools our team have developed and how we organized the full cycle of work on the project using the example of fault detection on seismic data.
    Kubeflow pipelines for Object detection models on the edge
    Imad Eddine Ibrahim, BEKKOUCH
    Provectus, Data Scientist
    PhD at Sorbonne University Paris(attending)
    This workshop will start with a small presentation of object detection models and which ones are most suitable for running on edge devices with real time inference speed. The next step is to configure and run a kubeflow pipeline locally and configure the hyper-parameters used for model training. Last is to see the results of several experiments and compare the best models
    Registration
    We are going to use ML REPA School platform for organize our conference Online. Please, register and book your place on Machine Learning REPA Week 2021!
    Register me
    Organizers
    Our partners
    email: info@ml-repa.ru
    telegram: t.me/mlrepa
    See you on the ML REPA Week 2021!
    Made on
    Tilda