Roadmap review and retrospective 2020: MLOps

The Roadmap

Architecture

  • Improve my knowledge and skills as systems architect;
  • Monolith to “smaller services” journey.

MLOps

  • MLOps definition;
  • Acquire machine learning knowledge;
  • Explore the main differences between machine learning development and software development.

Python

  • Improve my coding and testing skills.

Cloud Providers

  • Improve my Google Cloud Platform (GCP) knowledge;
  • Improve my Amazon Web Services (AWS knowledge;
  • Get a certification in on of the cloud providers.

Kubernetes

  • Acquire and evolve my knowledge around Kubernetes platform.

Monitoring

  • Improve my knowledge around how to build a monitoring platform: logs, metrics and tracing.

DevOpsDays Portugal 2020 edition

  • Organize the second DevOpsDays Portugal conference.

Reading

  • Read at least 5 books.

Blog

  • Write at least 3 posts.

The Review

New year in a new company with a lot of new things to learn. The main topic is Machine Learning since it’s the core discipline of Deeper Insights, my new company.

Architecture

I improved my knowledge around systems architecting since this year work included:

  • Breaking a monolith into smaller services. Off course it was the first step of a long journey due to the size and complexity of the monolith.
  • Creating new systems from scratch.

MLOps

In order to do a good job as an Operations Engineer, it was important to acquire knowledge about Machine Learning since my mission was to support Data Scientists developing, delivering, and supporting ML models. The main learnings were:

  • Understand the ML model lifecyle: train, build, deplloy and monitor;
  • Identify the main difference between ML development and (classical) software development.
  • Understand the Data Scientist dialect in order to talk the same language.

Python

Since Python was the official development programming language, and as a scripting language, it wasn’t difficult to adopt Python in operations context. So, I can’t say that I’m a good developer but I improved my Python skills enough to not making mad at other people who review my code.

Cloud Providers

My daily work included creating and managing resources on AWS and GCP and that allowed me to improve my knowledge of both cloud providers. However, I wasn’t able to reserve time to prepare and get a certification from at least one of the providers.

Kubernetes

Being containerizing the applications the default option, and Kubernetes the orchestration platform I acquired and develop a good knowledge around the platform setup, configuration, and utilization.

Monitoring

Before starting to build a monitoring platform the question – why build a monitoring platform instead of buying one? – needs to be answered. From experience there isn’t a monitoring tool than you can buy and cover all your company’s use cases. In all companies that I have worked so far multiple monitoring tools were used.

The next step was to define the monitoring platform components, where the plan was to support logs, metrics, and tracing. While reading the book Practical Monitoring I learned that when building a monitoring platform the best approach to it is to adopt the user perspective. Normally, a user makes makes the following questions:

  • Is the system online?
  • Is the system working as expected?

So, the plan was to implement the monitoring components by the following order:

  1. Healthcheck: Is the system or application up?
  2. Metrics: Is the system or application presenting the expected behavior?
  3. Logs: What’s the system or application doing?
  4. Tracing: Allows to know the full story regarding a request.

During this year different tools were tested for the different components.

DevOpsDays Portugal 2020 edition

The 2022 edition was cancceled due to COVID-19.

Reading

I was able to read 18 books during 2020.

Blog

I only published 2 posts. This is definitely something to improve.


Leave a reply:

Your email address will not be published.

Site Footer