Lab 4: ML Operations

caution

You are viewing this lab from the handbook. This lab is meant to be loaded as Cloud Shell tutorial. Please see the labs section on how to do so.

In this lab, we will build a machine learning model to assess, in real-time, whether incoming transactions are fraudulent or legitimate. Using Vertex AI Pipelines (based on Kubeflow), we will streamline the end-to-end ML workflow, from data preprocessing to model deployment, ensuring scalability and efficiency in fraud detection.

In Vertex AI, custom containers allow you to define and package your own execution environment for machine learning workflows. To store custom container images, create a repository in artifact registry:

gcloud artifacts repositories create bootkon --repository-format=docker --location=us-central1

Let us create two container images, one for training and one for serving predictions.

The training container is comprised of the following files. Please have a look at them:

  • train/Dockerfile: Executes the training script.
  • train/train.py: Downloads the data set from BigQuery, trains a machine learning model, and uploads the model to Cloud Storage.

The serving container image is comprised of the following files:

We can create the container images using Cloud Build, which allows you to build a Docker image using just a Dockerfile. The next command builds the image in Cloud Build and pushes it to Artifact Registry:

(cd src/ml/train && gcloud builds submit --region=us-central1 --tag=us-central1-docker.pkg.dev/<PROJECT_ID>/bootkon/bootkon-train:latest --quiet)

Let’s do the same for the serving image:

(cd src/ml/predict && gcloud builds submit --region=us-central1 --tag=us-central1-docker.pkg.dev/<PROJECT_ID>/bootkon/bootkon-predict:latest --quiet)

Vertex AI Pipelines

Now, have a look at pipeline.py. This script uses the Kubeflow domain specific language (dsl) to orchestrate the following machine learning workflow:

  1. CustomTrainingJobOp trains the model.
  2. ModelUploadOp uploads the trained model to the Vertex AI model registry.
  3. EndpointCreateOp creates a prediction endpoint for inference.
  4. ModelDeployOp deploys the model from step 2 to the endpoint from step 3.

Let’s execute it:

python src/ml/pipeline.py

The pipeline run will take around 20 minutes to complete. While waiting, please read the introduction to Vertex AI Pipelines.

Custom Training Job

The pipeline creates a custom training job – let’s inspect it in the Cloud Console once it has completed:

  1. Open Vertex AI Console
  2. Click Training in the navigation menu
  3. Click Custom jobs
  4. Click bootkon-training-job

Note the container image it uses and the arguments that are passed to the container (dataset in BigQuery and project id).

Model Registry

Once the training job has finished, the resulting model is uploaded to the model registry. Let’s have a look:

  1. Click Model Registry in the nevigation menu
  2. Click bootkon-model
  3. Click VERSION DETAILS

Here you can can see that a model in the Vertex AI Model Registry is made up from a Container image as well as a Model artifact location. When you deploy a model, Vertex AI simply starts the container and points it to the artifact location.

Endpoint for Predictions

The endpoint is created in a parallel branch in the pipeline you just ran. You can deploy models to an endpoint through the model registry.

  1. Click Online Prediction in the navigation menu
  2. Click bootkon-endpoint

You can see that the endpoint has one model deployed currently, and all the traffic is routed to it (traffic split is 100%). When scrolling down, you get live graphs as soon as predictions are coming in.

You can also train and deploy models on Vertex in the UI only. Let’s have a more detailed look. Click Edit Settings. Here you can find many options for model monitoring – why don’t you try to enable prediction drift detection?

Vertex AI Pipelines

Let’s have a look at the Pipeline as well.

  1. Click Pipelines in the navigation menu
  2. Click bootkon-pipeline-…

You can now see the individual steps in the pipeline. Please click through the individual steps of the pipeline and have a look at the Pipeline run analysis on the right hand side as you cycle pipeline steps.

Click on Expand Artifacts. Now, you can see expanded yellow boxes. These are Vertex AI artifacts that are created as a result of the previous step.

Feel free to explore the UI in more detail on your own!

Making predictions

Now that the endpoint has been deployed, we can send transactions to it to assess whether they are fraudulent or not. We can use curl to send transactions to the endpoint.

Have a look at predict.sh. In line 9 it uses curl to call the endpoint using a data file named instances.json containing 3 transactions.

Let’s execute it:

./src/ml/predict.sh

The result should be a JSON object with a prediction key, containing the predictions for each of the 3 transactions. 1 means fraud and 0 means non-fraud.

Success

Congratulations, intrepid ML explorer! 🚀 You’ve successfully wrangled data, trained models, and unleashed the power of Vertex AI. If your model underperforms, remember: it’s not a bug—it’s just an underfitting feature! Keep iterating, keep optimizing, and may your loss functions always converge. Happy coding! 🤖✨

aaa