Machine Learning REPA Week 2021

Free Online Conference on Machine Learning Engineering, MLOps and Management practices

05 - 11 April 2021
Our goal is

to share good practices and solutions for Machine Learning engineering and process automation, learn best Open Source tools, and share team collaboration and product management insights from around the various industries and applications.

Main topics
Topic 1: Machine Learning Product and Team Management
  • How to organize you project and code?
  • How to enforce your team collaboration?
  • How to manage a project growth?
Topic 2: ML pipelines automation. Code and Data version control. Reproducibility
  • How to build and automate pipelines?
  • Version control for your code, data and pipelines
  • Tools and practices in machine learning applications
  • Reproducibility of machine learning pipelines
Topic 3: ML experiments management and metrics tracking. Model Lifecycle Management
  • How to manage ML experiments and metrics tracking?
  • What tools to use?
  • Model Lifecycle and Development process
Topic 4: CI/CD and MLOps in ML
  • How to build a production ready solution with a model/pipeline you've developed?
  • What is MLOps and how to make it work
  • Build CI/CD for Machine Learning
Topic 5: Testing and Monitoring in Machine Learning
  • Testing in Machine Learning
  • Deploying your solution is not the end of the story!
  • How to monitor your model works appropriate?
  • Tools and integrations for monitoring deployed model
Program
Every day we have two parallel tracks. We start at 7:00 pm at Moscow time (9:00 am Los Angeles time). Check your time here
April 5
April 6
April 7
April 8
April 9
April 10
April 11
Track 1: Machine Learning Product and Team Management
Yuliya Rubtsova, PhD, Solution architect @ Datamonsters
Track 2: ML Pipelines Automation, Engineering & MLOps
Dmitry Petrov, Creator of DVC, Co-founder & CEO @ Iterative.ai
Hamza Tahir, ZenML, Co-creator
MOHAMED SABRI, Data Science and MLOps specialist
Track 1: Machine Learning Product and Team Management
Kseniia Melnikova, Product Owner(Data/AI), SoftwareOne
Ashley Beattie, Agile By Design - Head of DevOps Transformation
Yerzat Marat, Knowtions Research, Project manager
Track 2: ML Pipelines Automation, Engineering & MLOps
Alexander Mokryak, ML Engineer @ Exness
Soumanta Das, Yugen.ai, Co-Founder
Sandy Ryza, Software Engineer @ Elementl, working on Dagster
Mikhail Rozhkov, Co-Creator @ ML REPA, Solution Engineer @ Iterative.ai
Track 1: Machine Learning Product and Team Management
Askhat Urazbaev, Agile coach, Founder of LeanDS
Alexey Mogilnikov, Senior Data Scientist at SBDA Group
Chief Methodologist at LeanDS
Vasilia Gainulina, Senior Product Manager, Beeline Big Data
Track 2: ML Pipelines Automation, Engineering & MLOps
Timur Dzhumakaev, MegaFon, Senior DS
Elle O'Brien, Lecturer @University of Michigan School of Information
Data Scientist @ Iterative, Inc
Andrey Velichkevich, Senior Software Engineer at Cisco,
Сontributor to the Kubeflow
Ben Epstein, Machine Learning Lead @ Splice Machine
Track 1: Machine Learning Product and Team Management
Laszlo Sragner, Hypergolic, Founder quant research, mobile gaming, fintech NLP startup
Elena Samuylova, CEO & Co-founder Evidently AI
Alexander Moiseev, Head of Product Analytics, Capital Markets, Raiffeisenbank
Track 2: ML Pipelines Automation, Engineering & MLOps
Nikolay Nikitin, Senior Research Fellow @ National Center for Cognitive Technologies, ITMO University
Flavio Clesio, Machine Learning Engineer in Berlin
Andrey Lukyanenko, Data Scientist @ MTS AI
Kaggle Competition Master, Notebook 1st rank
Track 1: Machine Learning Product and Team Management
Michael Perlin, Machine Learning Engineer at Volkswagen
Artemy Malkov, Data Monsters, CEO, Lecturer at MIPT, PhD
Track 2: ML Pipelines Automation, Engineering & MLOps (tutorials)
Julia Antokhina, Data Scientist @ Mobile TeleSystems
Alexey Kozhevin, Data Scientist in Gazprom Neft
Elena Vilkova, ML Engineer @ ABN AMRO,
Track 1: Machine Learning Product and Team Management
Track 2: ML Pipelines Automation, Engineering & MLOps (tutorials)
Nikolay Nikitin, Senior Research Fellow @ National Center for Cognitive Technologies, ITMO University
Ritaban Chowdhury, Machine Learning Engineer @ RiiidLabs
Yee Tong, Backend engineer from Lyft and Climate Corporation
Katrina Rogan, Backend engineer previously at Lyft and Google
Track 1: Machine Learning Product and Team Management
Track 2: ML Pipelines Automation, Engineering & MLOps (tutorials)
Imad BEKKOUCH, Data Scientist @ Provectus
Mikhail Rozhkov, Co-Creator @ ML REPA
Solution Engineer @ Iterative.ai
Speakers
Track 1: Machine Learning Product and Team Management
Lean Data Science: agile practices for Data Science projects
Askhat Urazbaev, Agile coach, Founder @ LeanDS
In this talk, we will explore collaborative techniques that guide data science teams in their agile adaption. We will discuss how to come up with nice and clear product hypothesis, how to prioritise them using ICE/RICE method, how to decompose huge AI Epics into small and easy to validate data science hypothesis and how to effectively manage work using Kanban and Scrum approaches.
Lean Data Science: agile practices for Data Science projects
How to come up with product hypothesis, prioritise them and how to effectively manage work using Kanban and Scrum approaches.
Wrong but useful: turning ML models into ML products
Elena Samuylova, CEO & Co-founder Evidently AI
When working on machine learning projects, we often focus on technical challenges and building accurate models. However, a model itself is not a product. To solve the business problem at hand, we need to consider a wider set of requirements. In this talk, I will share a set of questions to think through when working on enterprise ML solutions to make sure your models get to production
    How become a good (team) manager in Machine Learning?
    Alexander Moiseev, Head of Product Analytics, Capital Markets @ Raiffeisenbank
    Key goals of a good manager - enhance the team to achieve new heights.
    Main tools: encourage expertise and knowledge sharing within the team, improve team's members visibility, develop a culture of continuous learning.

    This talk focuses on simple best tips and tricks for newcomers to a manager path. We will not speak about common practices. The talk will be focused on worst mistakes that all new managers do and on good and simple things that help everyone become a good manager in a short term
    Architecture of Machine Learning systems
    Michael Perlin, Machine Learning Engineer @ Volkswagen
    A happy moment: ML model leaves the notebook to start benefiting the business. DS faces the question of how to integrate it: the possibilities are usually many, many different decisions have to be made, and it is often unclear how to approach them.
    Software architecture is the discipline that is responsible for this. What does it involve? What skills and qualities does it require? Can DS master it? Who to call for help ?
    This talk is less about technology and more about processes, strategies and people.
    AI development process: Common mistakes
    Kseniia Melnikova, Product Owner(Data/AI) @ SoftwareOne
    Let's talk about the main problems and common mistakes of the AI dev process!

    In this talk I will cover full development stage of the ML models' building: data preparation and data version control, code versions control, metrics tracking and hyperparameters tuning. We will discuss the weak points for each component and possible mistakes that you and your team are probably making. I will provide you with the solutions and AI tools list, which will help you to cover your process fully in more sophisticated way.
    Structuring machine learning projects
    Yerzat Marat, Project manager @ Knowtions Research
    Yerzat has 4+ years delivering AI products and projects across various industries. His main interest lies in AI product management and how to best effectively deliver value using ML/DS, starting from concept through to production

    In this talk, we will cover a set of tools and principles that the speaker found useful with practical case breakdowns
    The 3 components your "Agile AI" product development stack should include
    Ashley Beattie, Head of DevOps Transformation @ Agile By Design
    What tools and techniques should I have in my toolbox when it comes to delivering AI products?
    1. What are the goals of an AI Product development system?
    2. Creating Valuable AI Products: the techniques, tools and approaches to understand, design, develop and test an AI opportunity
    3. Rapidly testing AI Product hypothesis: how to slice your opportunity to define an MVP that matters
    5 Principles of LeanML
    Laszlo Sragner, Founder @ Hypergolic
    In this talk, I will introduce LeanML in 5 key takeaways. Lean Machine Learning (LeanML) is a framework that enables Data Science teams to build business-oriented data products through a deliberate process. LeanML was created to address the difficulties of dealing with data-centric workflows and outcomes. It is inspired by techniques and knowhow from disciplines in quant trading, business intelligence, agile software engineering and strategic consulting.
    Your AI Project is a QUEST. Don't get eaten by DRAGONS!
    Artemy Malkov, PhD
    Data Monsters, CEO / Lecturer at MIPT / NVIDIA Elite Service Delivery Partner
    You're in big trouble when your team can't deliver a working AI solution on schedule.
    There's a reason. Research projects may take 10 times longer than scheduled.
    Learn about QUEST, MISSION, Decision-To-Be-Made, 3D-MAPping, RAIDs, CRAFT, LABYRINTH, DRAGONS and TREASURE - new lean data science tools and frameworks.
    Join this session to upgrade your AI management skills to mythic levels and let your team have fun and be productive.
    Track 2: ML pipelines automation. Code and Data version control. Reproducibility. MLOps
    DVC: data versioning and ML experiments on top of Git
    Dmitry Petrov, Creator of DVC - Data Version Control - Git for machine learning. Now co-founder & CEO of Iterative.ai. Ex-Data Scientist at Microsoft. PhD in Computer Science.
    ML practitioners rapidly experiment to optimize for the best results or analyze
    different subsets of data. Experiments need to be reproducible, both to recover
    and tweak experiments and to instill confidence in the final results.
    Reproducibility across experiments becomes more difficult as the data size and
    project complexity increase. The data and code to generate each experiment must
    be tracked, and running the entire pipeline from scratch may be infeasible.

    With the open-source tool DVC, hundreds of experiments can be tracked
    automatically. DVC tracks data, code, and metrics together, keeping the code and
    metadata in a Git repository while caching the data anywhere the user chooses.
    This approach scales with large data and complex projects, ensuring fully
    reproducible results that make experimentation efficient and easy.
    Building ML Pipelines with Dagster:
    The role of the orchestrator in machine learning
    Sandy Ryza, Software Engineer at Elementl, working on Dagster.
    There would be no machine learning models without features, and there would be no features without data pipelines. Orchestrators help data scientists and ML engineers assemble durable pipelines out of the data transformations that define their features. Dagster is an orchestrator that puts data at the center. While orchestrators typically focus on sequencing computations in production, Dagster brings orchestration to the entire ML development lifecycle. It helps engineers and data scientists answer questions like:

    • How will a change to how I model my data affect the performance of my ML model?
    • What data and code were used to train this model?
    • How can I test my ML pipelines?
    • How can I try out changes to my feature without messing up my production data?
    Evolutionary automation of ML pipelines with FEDOT Framework
    Nikolay Nikitin, Senior Research Fellow
    @ National Center for Cognitive Technologies, ITMO University
    I plan to talk about the AutoML solutions for classification, regression, clustering, and time series forecasting implemented in open-source FEDOT Framework (https://github.com/nccr-itmo/FEDOT).
    The framework allows building the modeling pipelines with the heterogeneous structure that can consist of blocks of different types (for example, ML-models, equation-based models, NLP models, neural networks, data preprocessing blocks, and even atomized pipelines) and have the multiscale or multimodal nature (for example, a model predicting different components of time series separately can be built automatically for a time series forecasting task). Also, the framework makes it possible to "export" the obtained model and data in order to improve the reproducibility of the AutoML-based experiments
    Automating Machine Learning with GitHub Actions & GitLab CI
    Elle O'Brien, Lecturer @University of Michigan School of Information
    Data Scientist @ Iterative, Inc
    Machine learning is maturing as a discipline: now that it's trivially easy to create and train models, it's never been more challenging to manage the complexity of experiments, changing datasets, and the demands of a full-stack project. In this talk, we'll examine why one of the staples of DevOps, continuous integration, has been so challenging to implement in ML projects so far and how it can be done using open-source tools like Git, GitHub Actions, and DVC (Data Version Control).
    We'll also discuss a new open source project (Continuous Machine Learning) created to adapt popular continuous integration systems like GitHub Actions and GitLab CI to data science projects. We'll cover example use cases, including automated model testing in a standardized environment, getting detailed reporting on model behavior in a pull request, and training models on cloud GPUs.
    Develop End To End Scalable ML Pipeline With Kubeflow
    Ritaban Chowdhury, Machine Learning Engineer @ RiiidLabs
    I am going to talk about how to productionalize ML product using Kubeflow

    85% of ML models are not used in production settings. In Industry if you do not test something on production settings it does not generate value. Kubeflow is a platform which makes prototyping ML models easier and salable. We will learn the very basics of this platform.
    Evolutionary automation of ML pipelines with FEDOT Framework
    Nikolay Nikitin, Senior Research Fellow
    @ National Center for Cognitive Technologies, ITMO University
    I plan to talk about the AutoML solutions for classification, regression, clustering, and time series forecasting implemented in open-source FEDOT Framework (https://github.com/nccr-itmo/FEDOT).
    The framework allows building the modeling pipelines with the heterogeneous structure that can consist of blocks of different types (for example, ML-models, equation-based models, NLP models, neural networks, data preprocessing blocks, and even atomized pipelines) and have the multiscale or multimodal nature (for example, a model predicting different components of time series separately can be built automatically for a time series forecasting task). Also, the framework makes it possible to "export" the obtained model and data in order to improve the reproducibility of the AutoML-based experiments
    Better Code Quality for Data Science
    Julia Antokhina, Data Scientist at Mobile TeleSystems
    What you'll learn?
    At this tutorial, I'll show the full pipeline of setting up code quality checks and tests at your repository. I'll focus on quick wins and solutions for common difficulties. These instruments seem simple but may help you a lot not only improve your own skills but work better as a team of Data Scientists
    Reproducibility of ML solutions in seismic interpretation project
    Alexey Kozhevin, Data Scientist at Gazprom-Neft.
    Solving seismic interpretation tasks with neural networks
    Reproducibility of ML solutions in seismic interpretation is important at all stages of work: from data loading and deploying a production environment to model training and metrics evaluation.

    We will show what open-source tools our team have developed and how we organized the full cycle of work on the project using the example of fault detection on seismic data.
    Kubeflow pipelines for Object detection models on the edge
    Imad Eddine Ibrahim, BEKKOUCH
    Provectus, Data Scientist
    PhD at Sorbonne University Paris(attending)
    This workshop will start with a small presentation of object detection models and which ones are most suitable for running on edge devices with real time inference speed. The next step is to configure and run a kubeflow pipeline locally and configure the hyper-parameters used for model training. Last is to see the results of several experiments and compare the best models
    Why you should start writing ML pipelines from training day 0
    Hamza Tahir, ZenML, Co-creator
    What you'll learn?
    We will learn why ML pipelines are important, why they are hard to create, and why it's valuable to have them written as early as possible while developing ML models.
    Intended audience are data scientists and ML Engineers who are interested in learning about bridging the experimentation and production phase of the machine learning development lifecycle

    What is unique about this talk?
    Bridging the gap between the experimentation and the productionalization phase of the machine learning workflow
      MLflow: creating experiments and logging metrics in Databricks and MLflow Tracking Servers
      Elena Vilkova, ABN AMRO, ML Engineer
      I'm working as a Platform and ML Engineer in the Dutch bank ABN AMRO to enable Advanced Analytics projects. Our purpose is to build a robust and standard platform that DSs, DAs and MLEs can use for a fast full ML lifecycle from exploration to production

      What you'll learn?
      Each ML model should be fully controlled! After the tutorial you will understand how MLflow works and how to use MLflow Tracking for logging parameters and metrics of your ML model. We will understand difference between models, experiments and runs. I will show you UI possibilities of ML management in both Databricks and MLflow Tracking servers. We will also discuss how MLflow Tracking server is organised and can be deployed.
      Writing reusable training pipelines for deep learning
      Andrey Lukyanenko, MTS AI, Data Scientist
      4 years as ERP-consultant, 4 years as Data Scientist
      Kaggle Competition Master, Notebook 1st rank
      What you'll learn?
      The conference's attendees will know when writing a custom training pipeline is worth the efforts, what important functionality should it have, and see an example of such pipeline.
      This talk could be interesting to those, who have already started training deep learning models and want to make the training more systematic.

      What is unique about this talk?
      In this talk, I'll show a working training pipeline for deep learning based on PyTorch-lightning as a wrapper over PyTorch code and Hydra for managing configuration files.

      Personal website: https://andlukyane.com/activities
      Twitter: https://twitter.com/AndLukyane
      VCS repository structure for ML projects
      Timur Dzhumakaev, MegaFon, Senior DS
      Experience - nearly 5 year experience in the field in various roles: backend developer, ML engineer and data scientist
      What you'll learn?
      Data scientists can learn about the benefits of VCS, which can store results of many experiments and enhance communication both within and outside the team.
      ML engineers/developers can formulate their requirements to DS better - with right demands they can focus more on performance optimization and less on refactoring.

      What is unique about this talk?
      I This talk is about essential task for IT projects - settings VCS repository. And while it is known that version control for ML projects is difficult - this topic is rarely discussed.
      Reducing the distance between Prototyping and Production - Why obsessing over experimentation and iteration compounds ROIs
      Soumanta Das, Yugen.ai, Co-Founder
      7+ years (Data Science Consulting, Product, ML Systems)
      A case study on how setting Minimum Achievable Goals and continuous improvement can help realize value and establish a strong ML engineering culture.

      What you'll learn?
      • Bridging the gap from Prototype to Production by keeping small, achievable impact goals
      • How frequent iteration and continuous experimentation can increase ROIs and set the path for a strong ML engineering culture
      • How to balance model development, deployments, architecture improvements and monitoring keeping in mind business goals

      Who should attend your talk/tutorial?
      • Data Scientists and ML Engineers looking to learn how to prioritize their efforts and workflows
      • Hopefully anyone working at a startup trying to build an ML team can gain from our experiences

      From Jupyter Notebook to Reproducible and Automated experiments & MLOps for batch scoring applications
      Mikhail Rozhkov, Co-Creator @ ML REPA, Solution Engineer @ Iterative.ai
      I'm a Co-creator of the Machine Learning REPA project and ML REPA School. Author of online courses on ML Experiments automation and MLOps with DVC. Has over 6 years of hands-on experience in Machine Learning & Data Science, leads projects, and helps teams to implement good tools and engineering practices. Recently I've joined the Iterative.ai team as a Solution Engineer.

      What you'll learn?
      • How to organize a team workflow
      • How to work with DVC, MLflow, and Airflow together
      • How to organize basic configuration file for ML projects
      • How to automate ML experiments with DVC
      Who should attend your talk/tutorial?
      • Data Scientists and ML Engineers
      • Team Leads and ML Project Managers
      Flyte: Accelerate your ML and Data Workflows to production
      Katrina Rogan, Backend engineer previously at Lyft and Google
      Experience working on data pipelines for mapping, travel search and ad performance reporting


      What you'll learn?
      Come to learn about https://flyte.org. A modern approach to cloud native orchestration that enables and accelerates Reproducible, consistent and scalable pipelines from local to production.

      Attend if:
      • You run production pipelines
      • Want to use cloud native pipeline orchestration engine that has been used in production at scale at large companies like Lyft, Spotify, Freenome etc
      • Learn about https://flyte.org and the benefits of using a specification based pipeline engine
      • Want to run executions independently within an organization and not worry about resource scaling etc
      • Use Kubernetes and docker in a very simple easy to use way - no writing yamls, or complicated apis
      Flyte: Accelerate your ML and Data Workflows to production
      Yee Tong, Backend engineer from Lyft and Climate Corporation.
      Passion for functional programming and orchestration. Deep experience with US mortgage market. Seattle native and involved with Flyte since inception.

      What you'll learn?
      Come to learn about https://flyte.org. A modern approach to cloud native orchestration that enables and accelerates Reproducible, consistent and scalable pipelines from local to production.

      Attend if:
      • You run production pipelines
      • Want to use cloud native pipeline orchestration engine that has been used in production at scale at large companies like Lyft, Spotify, Freenome etc
      • Learn about https://flyte.org and the benefits of using a specification based pipeline engine
      • Want to run executions independently within an organization and not worry about resource scaling etc
      • Use Kubernetes and docker in a very simple easy to use way - no writing yamls, or complicated apis
      MLOps and AutoML in Cloud-Native Way with Kubeflow and Katib
      Andrey Velichkevich, Senior Software Engineer at Cisco,
      Andrey Velichkevich is a Senior Software Engineer at Cisco and is one of the major contributors to the Kubeflow open-source project.

      He is a co-chair for the AutoML working group and co-lead for the Training working group. Andrey hosts Kubeflow community meetings for the AutoML and Training working group, organises community webinars and writes the blogs. In addition to that, Andrey helps the community to drive the CI/CD infrastructure and contributes to the ML benchmark system for the Kubeflow.
      Registration
      We are going to use ML REPA School platform for organize our conference Online. Please, register and book your place on Machine Learning REPA Week 2021!
      Register me
      Organizers
      Our partners
      email: info@ml-repa.ru
      telegram: t.me/mlrepa
      See you on the ML REPA Week 2021!
      Made on
      Tilda