AutoML: Where Artificial Intelligence meets Cloud Computing

AutoML: Where Artificial Intelligence meets Cloud Computing

How I trained an image classification Machine Learning model using Google Cloud AutoML Vision API

Part One: The introductory part about AutoML as a Cloud service

Artificial Intelligence, especially vis-à-vis machine learning, is becoming the next big thing in the tech community. Many say that by enabling business leaders to make more informed decisions, researchers to look at problems in new ways, and offering insights around the clock that no human could possibly contextualize alone, Artificial Intelligence is set to be one of humanity’s best allies in the future.

Now, although I tweeted this about ten months ago, it is interesting that to this day, discussions about feature engineering amongst Artificial Intelligence and Machine Learning experts usually revolve around the credence of difficulty and how exhausting it tends to be. However, it is one of the most crucial tasks and plays a major role in determining the outcome of a Machine Learning model. Come to think of it, wouldn't life be a boatload easier if there existed a way to abstract the stress that comes with feature engineering and by extension, the machine learning workflow? That is where Automated Machine Learning (AutoML) comes in.

image.png

AutoML is a set of very versatile tools that automate the machine learning process in an end-to-end manner, in order to produce simple solutions and faster creation of these solution and models. Although AutoML can be done locally using a neural architecture search, which usually goes through all the phases of ML development in a systematic way until the model is trained, it is worthy of note that one of the easiest ways to use AutoML is by using a cloud service.

The general consensus is that cloud computing, which involves the delivery of different services through the internet, by virtue of its prowess will continue to grow in the future and provide many benefits. Most times, Cloud Computing models are divided into three types - Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS). In my opinion, this classification misses a rather important model of cloud computing, Machine Learning as a Service (MLaaS). Machine learning as a service (MLaaS) is an umbrella definition of various cloud-based platforms that cover most machine learning infrastructure issues such as data pre-processing, model training, and model evaluation, with further prediction. AutoML which automates the entire Machine Learning workflow falls under Machine Learning as a Service.

Amazon SageMaker Autopilot, Microsoft Azure Automated Machine Learning and Google Cloud AutoML are the three leading AutoML cloud services that allow for fast model training and deployment.

image.png

  • Amazon SageMaker Autopilot

A subdivision of Amazon Web Services (AWS), this automatically trains and tunes the best machine learnng models for classification or regression based on the given data, whilst allowing users maintain full control and visibility.

image.png

Starting with users' raw data, users can select a label or target. Autopilot then searches for candidate models for users to review and choose from. All of these steps are documented on executable notebools that give users full control and reproductibility of the process. This includes a leaderboard of model candidates to help users select the best model for their needs.

  • Microsoft Azure Automated Machine Learning

image.png

As a Microsoft Learn Student Ambassador, this particular service piques my interest. In line with the aim to empower professional and non-professional machine learning engineers to build machine learning models rapidly, Microsoft Azure Automated Machine Learning services starts with automatic feature selection, followed by model selection and hyperparameter tuning on the selected model to generate the most optimized model for the task at hand. Users can either create models using a no-code UI or using code-first notebooks. Users can also quickly customize their model, apply control settings to iterations, validations, thresholds, validations, blocked algorithms and other experimental criteria.

  • Google Cloud AutoML

Google Cloud AutoML is a suite of Machine Learning products that enables developers with limited machine learning expertise to train high-quality models specific to their business needs. It relies on Google, a state-of-the-art transfer learning and neural architecture search technologies.

The important point to note is that Cloud AutoML is not just one thing. It is a suite of different products, each focused towards particular use cases and data types. For example, for image data, there's AutoML vision and for video data, there's AutoML Video Intelligence. For natural language, there's AutoML natural language and for translation, there's AutoML translation. Finally, for general structured data, there's AutoML Tables.

image.png

As in the instance given below, users can use Cloud AutoML to train, evaluate, improve and deploy models based on the available data. Hence, within a few minutes, I used Google Cloud AutoML for my own custom machine learning model, as explained below.

Part Two - The part where I classified images of clouds using Google Cloud AutoML Vision

AutoML Vision helps developers with limited ML expertise train high quality image recognition models. I used AutoML to classify images of the cloud using Google Cloud AutoML vision. Here is a step-by-step explanation of how I did that.

image.png

Here is the Google Cloud Platform's dashboard. The first step I took was logging into the platform.

  • Since AutoML Vision provides an interface for all the steps in training an image classification model and generating predictions on it, I started by enabling the Cloud AutoML API.

  • From the Navigation menu, I selected APIs & Services > Library.

image.png

  • In the search bar, I typed in "Cloud AutoML", then clicked on the Cloud AutoML API result and then clicked Enable.

This took about a minute to set up.

  • Thereafter, I opened this AutoML UI link in a new browser.

The next step was to activate Cloud Shell.

Cloud Shell is a virtual machine that is loaded with development tools. It offers a persistent 5GB home directory and runs on the Google Cloud. Cloud Shell provides command-line access to the Google Cloud resources.

  • In the Cloud Console, by the top right toolbar, I clicked the Activate Cloud Shell button.

image.png

  • I then clicked Continue.

It took a few moments to connect to the environment.

When connected, by default, I was already authenticated, and the project was set to my PROJECT_ID.

image.png

Note: gcloud is the command-line tool for Google Cloud. It comes pre-installed on Cloud Shell and supports tab-completion. For full documentation of gcloud, see the gcloud command-line tool overview.

  • In Cloud Shell, I used the following commands to create environment variables
export PROJECT_ID=$DEVSHELL_PROJECT_ID
export QWIKLABS_USERNAME=<USERNAME>
  • I then ran the following command to give AutoML permissions:
gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member="user:$QWIKLABS_USERNAME" \
    --role="roles/automl.admin"
  • Thereafter, I created a storage bucket by running the following:
gsutil mb -p $PROJECT_ID \
    -c standard    \
    -l us-central1 \
    gs://$PROJECT_ID-vcm/
  • In the Google Cloud console, I opened the Navigation menu and clicked on Cloud Storage to see it.

image.png

image.png

The next task was uploading training images to the Cloud Storage.

The specific classification ML model is built was to classify images of clouds. I knew that I needed to provide labeled training data so the model can develop an understanding of the image features associated with different types of clouds. In this instance, my model learnt to classify three different types of clouds: cirrus, cumulus, and cumulonimbus. To use AutoML Vision, I also figured that I needed to put the training images in Cloud Storage.

  • However, before adding the cloud images, I created an environment variable with the name of the bucket by running the following command in Cloud Shell:
export BUCKET=$PROJECT_ID-vcm

Note: The training images are publicly available in a Cloud Storage bucket.

  • I then used the gsutil command line utility for Cloud Storage to copy the training images into the bucket:
gsutil -m cp -r gs://spls/gsp223/images/* gs://${BUCKET}
  • When the images finished copying, I clicked the Refresh button at the top of the Storage browser and then clicked on the bucket name.

Three folders of photos for each of the 3 different cloud types to be classified then came up.

image.png

Note: If I had clicked on the individual image files in each folder, I would have seen the photos I used to train my model for each type of cloud.

The next task was creating a dataset.

Since my training data is now in Cloud Storage, I needed a way for AutoML Vision to access it. I created a CSV file where each row contains a URL to a training image and the associated label for that image. By virtue of me downloading the dataset from source, this CSV file had been created for me; I just needed to update it with my bucket name.

  • I then ran the following command to copy the file to my Cloud Shell instance:
gsutil cp gs://spls/gsp223/data.csv .
  • Thereafter, I updated the CSV with the files in my project:
sed -i -e "s/placeholder/${BUCKET}/g" ./data.csv
  • And then, I uploaded this file to my Cloud Storage bucket:
gsutil cp ./data.csv gs://${BUCKET}
  • Once the command was completed, I clicked the Refresh button at the top of the Storage browser and confirmed that I saw the data.csv file in my bucket.

image.png

  • I navigated back to the AutoML Vision dataset tab.

image.png

  • At the top of the console, I clicked + NEW DATASET.

  • I typed "clouds" for the Dataset name.

  • I selected "Single-Label Classification".

image.png

  • I clicked CREATE DATASET.

  • I chose Select a CSV file on Cloud Storage and added the file name to the URL for the file I just uploaded - gs://your-bucket-name/data.csv

Note: An easy way to get this link was to go back to the Cloud Console, click on the data.csv file and then, click on the copy icon in the URL field.

image.png

  • After the import was completed, I then clicked on the Images tab to see the images I uploaded.

image.png

With the data downloaded, the next major step was to train my Machine Learning model. Luckily for me, AutoML Vision handled this for me automatically, without requiring me to write any of the model code.

  • To train my clouds model, I naviagted to the Train tab and clicked Start Training.

  • I entered a name for my model. I named mine salimmodel.

  • Thereafter, I left Cloud-hosted selected and then click Continue.

  • I set the node hours to 8 as opposed the recommended 16 hours.

image.png

  • I clicked Start Training.

  • Lucky for me, it only took around twenty (20) minutes to complete. Perhaps because this is a small dataset.

image.png

In the Evaluate tab, information about Precision and Recall of the model can be seen.

image.png

  • Lastly, I scrolled down to take a look at the Confusion matrix.

image.png

Now it's time for the most important part: generating predictions on your trained model using data it hasn't seen before i.e. deploy my Machine Learning model.

  • I navigated to the Test & Use tab in the AutoML UI:

image.png

  • I clicked Deploy model and then clicked Deploy.

  • This took around 15 minutes to deploy.

  • Then, I returned to the AutoML Vision UI, clicked Upload Images and uploaded the clouds to the online prediction UI.

When the prediction request completed, I saw the screen shown below, signifying that the model had been trained and returned with an accuracy score of 95%.

image.png

TL;DR - Part 2;

Here is an abridged version on the steps I took.

  1. I uploaded images to the AutoML UI

  2. I uploaded a labeled dataset to Cloud Storage.

  3. Connect the dataset to AutoML Vision with a CSV label file.

  4. Train a model that will be immediately available on Google Cloud for generating predictions via an easy to use REST API, AutoML Vision

  5. Evaluate the accuracy of the model.

  6. Generate predictions on the trained model.

Yes, I just built a 95%-accuracy Machine Learning model without a single line of Machine Learning code or performing Feature Engineering.