Roadmap review and retrospective 2020: MLOps

The Roadmap

Architecture

  • Improve my knowledge and skills as a systems architect;
  • Monolith to “smaller services” journey.

MLOps

  • MLOps definition;
  • Acquire machine learning knowledge;
  • Explore the main differences between machine learning development and software development.

Python

  • Improve my coding and testing skills.

Cloud Providers

  • Improve my Google Cloud Platform (GCP) knowledge;
  • Improve my Amazon Web Services (AWS knowledge;
  • Get certification in one of the cloud providers.

Kubernetes

  • Acquire and evolve my knowledge of the Kubernetes platform.

Monitoring

  • Improve my knowledge of how to build a monitoring platform: logs, metrics, and tracing.

DevOpsDays Portugal 2020 edition

  • Organize the second DevOpsDays Portugal conference.

Reading

  • Read at least 5 books.

Blog

  • Write at least 3 posts.

The Review

New year in a new company with a lot of new things to learn. The main topic is Machine Learning since it’s the core discipline of Deeper Insights, my new company.

Architecture

I improved my knowledge of systems architecting since this year’s work included:

  • Breaking a monolith into smaller services. Of course, it was the first step of a long journey due to the size and complexity of the monolith.
  • Creating new systems from scratch.

MLOps

In order to do a good job as an Operations Engineer, it was important to acquire knowledge about Machine Learning since my mission was to support Data Scientists in developing, delivering, and supporting ML models. The main learnings were:

  • Understand the ML model lifecycle: train, build, deploy and monitor;
  • Identify the main difference between ML development and (classical) software development.
  • Understand the Data Scientist dialect in order to talk the same language.

Python

Since Python was the official programming language used by the company, it wasn’t difficult to adopt Python in the context of Operations. So, I can’t say that I’m a good developer but I improved my Python skills enough to not make mad at other people who review my code.

Cloud Providers

My daily work included creating and managing resources on AWS and GCP and that allowed me to improve my knowledge of both cloud providers. However, I wasn’t able to reserve time to prepare and get a certification from at least one of the providers.

Kubernetes

Being containerizing the applications as the default option, and Kubernetes as the orchestration platform I acquired and develop a good knowledge of the platform setup, configuration, and utilization.

Monitoring

Before starting to build a monitoring platform the question why build a monitoring platform instead of buying one? needs to be answered. From experience, there isn’t a monitoring tool that you can buy and cover all your company’s use cases. In all companies that I have worked for so far, multiple monitoring tools were used.

The next step was to define the monitoring platform components, where the plan was to support logs, metrics, and tracing. While reading the book Practical Monitoring I learned that when building a monitoring platform the best approach to it is to adopt the user perspective. Normally, a user makes the following questions:

  • Is the system online?
  • Is the system working as expected?

So, the plan was to implement the monitoring components in the following order:

  1. Healthcheck: Is the system or application up?
  2. Metrics: Is the system or application presenting the expected behavior?
  3. Logs: What’s the system or application doing?
  4. Tracing: Allows to know the full story regarding a request.

During this year different tools were tested for the different components.

DevOpsDays Portugal 2020 edition

The 2022 edition was canceled due to COVID-19.

Reading

I was able to read 18 books in 2020.

Blog

I only published 2 posts. This is definitely something to improve.


Leave a reply:

Your email address will not be published.

Site Footer