Lab 4: ML Operations
caution
You are viewing this lab from the handbook. This lab is meant to be loaded as Cloud Shell tutorial. Please see the labs section on how to do so.
In this lab, we will build a machine learning model to assess, in real-time, whether incoming transactions are fraudulent or legitimate. Using Vertex AI Pipelines (based on Kubeflow), we will streamline the end-to-end ML workflow, from data preprocessing to model deployment, ensuring scalability and efficiency in fraud detection.
In Vertex AI, custom containers allow you to define and package your own execution environment for machine learning workflows. To store custom container images, create a repository in artifact registry:
gcloud artifacts repositories create bootkon --repository-format=docker --location=us-central1
Let us create two container images, one for training and one for serving predictions.
The training container is comprised of the following files. Please have a look at them:
train/Dockerfile
: Executes the training script.train/train.py
: Downloads the data set from BigQuery, trains a machine learning model, and uploads the model to Cloud Storage.
The serving container image is comprised of the following files:
predict/Dockerfile
: Executes the serving script to answer requests.predict/predict.py
: Downloads the model from Cloud Storage, loads it, and answers predictions on port8080
.
We can create the container images using Cloud Build, which allows you to build a Docker image using just a Dockerfile. The next command builds the image in Cloud Build and pushes it to Artifact Registry:
(cd src/ml/train && gcloud builds submit --region=us-central1 --tag=us-central1-docker.pkg.dev/<PROJECT_ID>/bootkon/bootkon-train:latest --quiet)
Let’s do the same for the serving image:
(cd src/ml/predict && gcloud builds submit --region=us-central1 --tag=us-central1-docker.pkg.dev/<PROJECT_ID>/bootkon/bootkon-predict:latest --quiet)
Vertex AI Pipelines
Now, have a look at pipeline.py
. This script uses the Kubeflow domain specific language (dsl) to orchestrate the following machine learning workflow:
CustomTrainingJobOp
trains the model.ModelUploadOp
uploads the trained model to the Vertex AI model registry.EndpointCreateOp
creates a prediction endpoint for inference.ModelDeployOp
deploys the model from step 2 to the endpoint from step 3.
Let’s execute it:
python src/ml/pipeline.py
The pipeline run will take around 20 minutes to complete. While waiting, please read the introduction to Vertex AI Pipelines.
Custom Training Job
The pipeline creates a custom training job – let’s inspect it in the Cloud Console once it has completed:
- Open Vertex AI Console
- Click
Training in the navigation menu - Click
Custom jobs - Click
bootkon-training-job
Note the container image it uses and the arguments that are passed to the container (dataset in BigQuery and project id).
Model Registry
Once the training job has finished, the resulting model is uploaded to the model registry. Let’s have a look:
- Click
Model Registry in the nevigation menu - Click
bootkon-model - Click
VERSION DETAILS
Here you can can see that a model in the Vertex AI Model Registry is made up from a Container image as well as a Model artifact location. When you deploy a model, Vertex AI simply starts the container and points it to the artifact location.
Endpoint for Predictions
The endpoint is created in a parallel branch in the pipeline you just ran. You can deploy models to an endpoint through the model registry.
- Click
Online Prediction in the navigation menu - Click
bootkon-endpoint
You can see that the endpoint has one model deployed currently, and all the traffic is routed to it (traffic split is 100%). When scrolling down, you get live graphs as soon as predictions are coming in.
You can also train and deploy models on Vertex in the UI only. Let’s have a more detailed look. Click
Vertex AI Pipelines
Let’s have a look at the Pipeline as well.
- Click
Pipelines in the navigation menu - Click
bootkon-pipeline-…
You can now see the individual steps in the pipeline. Please click through the individual steps of the pipeline and have a look at the Pipeline run analysis on the right hand side as you cycle pipeline steps.
Click on Expand Artifacts. Now, you can see expanded yellow boxes. These are Vertex AI artifacts that are created as a result of the previous step.
Feel free to explore the UI in more detail on your own!
Making predictions
Now that the endpoint has been deployed, we can send transactions to it to assess whether they are fraudulent or not.
We can use curl
to send transactions to the endpoint.
Have a look at predict.sh
. In line 9 it uses curl
to call the endpoint using a data file named instances.json
containing 3 transactions.
Let’s execute it:
./src/ml/predict.sh
The result should be a JSON object with a prediction
key, containing the predictions for each of the 3 transactions. 1
means fraud and 0
means non-fraud.
Success
Congratulations, intrepid ML explorer! 🚀 You’ve successfully wrangled data, trained models, and unleashed the power of Vertex AI. If your model underperforms, remember: it’s not a bug—it’s just an underfitting feature! Keep iterating, keep optimizing, and may your loss functions always converge. Happy coding! 🤖✨