Automated machine learning in Azure Machine Learning

Leandro Oliveira
Nov 3, 2020
10 min read

Updated: Nov 3, 2020

Introduction

This article pretends to show all the steps to build an Automated Machine Learning using Azure Machine Learning on the Cloud.

Automation has been one of the main strategies in search of more efficient processes that make it possible to save time and generate greater productivity with few resources. In this tutorial, we will present a way to implement a machine learning process in an automated way on the cloud.

In modern times where we have access to all information and documentation, there is no more need to program all processes, Python has become popular because it makes life easier for developers, so this is a tool that also fulfils this purpose, that is, make life easier for developers so that there are less mechanization and more time dedicated to the productivity of ideas and innovation.

Create an Azure Machine Learning workspace

Sign into the Azure portal using your Microsoft credentials.
Select ＋Create a resource, search for Machine Learning, and create a new Machine Learning resource the following settings:
- Workspace Name: A unique name of your choice
- Subscription: Your Azure subscription
- Resource group: Create a new resource group with a unique name
- Location: Choose any available location
Wait for your workspace to be created (it can take a few minutes). Then go to it in the portal.
On the Overview page for your workspace, launch Azure Machine Learning studio (or open a new browser tab and navigate to https://ml.azure.com ), and sign into Azure Machine Learning studio using your Microsoft account.
In Azure Machine Learning studio, toggle the ☰ icon at the top left to view the various pages in the interface. You can use these pages to manage the resources in your workspace.

You can manage your workspace using the Azure portal, but for data scientists and Machine Learning operations engineers, Azure Machine Learning studio provides a more focused user interface for managing workspace resources.

https://docs.microsoft.com/en-gb/learn/modules/use-automated-machine-learning/create-workspace

Create compute resources

After you have created an Azure Machine Learning workspace, you can use it to manage the various assets and resources you need to create machine learning solutions. At its core, Azure Machine Learning is a platform for training and managing machine learning models, for which you need compute on which to run the training process.

Create compute targets

Compute targets are cloud-based resources on which you can run model training and data exploration processes.

In Azure Machine Learning studio , view the Compute page (under Manage). This is where you manage the compute targets for your data science activities. There are four kinds of compute resource you can create:
- Compute Instances: Development workstations that data scientists can use to work with data and models.
- Compute Clusters: Scalable clusters of virtual machines for on-demand processing of experiment code.
- Inference Clusters: Deployment targets for predictive services that use your trained models.
- Attached Compute: Links to existing Azure compute resources, such as Virtual Machines or Azure Databricks clusters.
On the Compute Instances tab, add a new compute instance with the following settings. You'll use this as a workstation from which to test your model:
- Compute name: enter a unique name
- Virtual Machine type: CPU
- Virtual Machine size: Standard_DS11_v2
While the compute instance is being created, switch to the Compute Clusters tab, and add a new compute cluster with the following settings. You'll use this to train a machine learning model:
- Compute name: enter a unique name
- Virtual Machine size: Standard_DS11_v2
- Virtual Machine priority: Dedicated
- Minimum number of nodes: 2
- Maximum number of nodes: 2
- Idle seconds before scale down: 120

Fig.1 Accessing the Resource to start creating the Machine Learning predictive model.

Fig.2 Creating Azure Machine Learning Resource.

Fig.4 Creating the resource group and the workspace.

Fig.5 Deploying the resource.

Fig.6 Resource created.

Fig.7 Accessing the Azure Machine Learning Studio clicking the “Launch studio”.

Fig.8 Starting the Azure Machine Learning Studio.

To start the Azure Machine Learning Studio we should to create a compute resource as we show below.

Fig.9 Creating a compute Resource as a Virtual Machine.

To create our production environment, we must follow the recommendations in the documentation to avoid unnecessary expenses and because the free account does not have all the production resources available.

Below are the recommendations for creating the computing resource that we will need to create.

On the Compute Instances tab, add a new compute instance with the following settings. You'll use this as a workstation from which to test your model:

· Compute name: enter a unique name

· Virtual Machine type: CPU

· Virtual Machine size: Standard_DS11_v2

https://docs.microsoft.com/en-gb/learn/modules/use-automated-machine-learning/create-compute

Bearing in mind that the cloud is a pay-as-you-go service, that is, the expenses occur only when using the resources, so it is recommended to turn off all resources that are not being used so that there are no unnecessary expenses.

Remember that offline resources also consume storage as they need to be stored in the cloud. Therefore, in case of use for academic purposes, it is recommended to delete all resources to avoid including storage expenses.

Fig.10 Selecting the recommended compute resource.

Fig.11 Given a name to the compute resource

Fig.12 Compute Instance Running.

While the compute instance is being created, switch to the Compute Clusters tab, and add a new compute cluster with the following settings. You'll use this to train a machine learning model:

· Compute name: enter a unique name

· Virtual Machine size: Standard_DS11_v2

· Virtual Machine priority: Dedicated

· Minimum number of nodes: 2

· Maximum number of nodes: 2

· Idle seconds before scale down: 120

The compute targets will take some time to be created. You can move onto the next unit while you wait.

As we are in a test environment, all configuration will be designed for simulation and therefore we will not have many nodes.

Fig.13 Compute Cluster running.

Machine learning models must be trained with existing data. In this case, you'll use a dataset of historical bicycle rental details to train a model that predicts the number of bicycle rentals that should be expected on a given day, based on seasonal and meteorological features.

Create a dataset

The data is derived from Capital Bikeshare and is used in accordance with the published data license agreement.

· Basic Info:

o Web URL: https://aka.ms/bike-rentals

o Name: bike-rentals

o Dataset type: Tabular

o Description: Bicycle rental data

· Settings and preview:

o File format: Delimited

o Delimiter: Comma

o Encoding: UTF-8

o Column headers: Use headers from first file

o Skip rows: None

· Schema:

o Include all columns other than Path

o Review the automatically detected types

· Confirm details:

o Do not profile the dataset after creation

Fig.14 The original dataset format.

Then we will register our dataset through the web repository in which it is stored publicly.

Fig.15 Searching for the dataset.

Fig.16 Creating dataset from web files.

Fig.17 Dataset Setting and Previews.

Fig.18 Dataset Schema.

Fig.19 Confirming details.

Fig.20 Registered Dataset.

Train a machine learning model

Azure Machine Learning includes an automated machine learning capability that leverages the scalability of cloud compute to automatically try multiple pre-processing techniques and model-training algorithms in parallel to find the best performing supervised machine learning model for your data.

Run an automated machine learning experiment

In Azure Machine Learning, operations that you run are called experiments. Follow the steps below to run an experiment that uses automated machine learning to train a regression model that predicts bicycle rentals.

Create a new Automated ML run with the following settings:

· Select dataset:

o Dataset: bike-rentals

· Configure run:

o New experiment name: mslearn-bike-rental

o Target column: rentals (this is the label the model will be trained to predict)

o Training compute target: the compute cluster you created previously

· Task type and settings:

o Task type: Regression (the model will predict a numeric value)

o Additional configuration settings:

Primary metric: Select Normalized root mean square error (more about this metric later!)
Explain best model: Selected - this option causes automated machine learning to calculate feature importance for the best model; making it possible to determine the influence of each feature on the predicted label.
Blocked algorithms: Block all other than RandomForest and LightGBM - normally you'd want to try as many as possible, but doing so can take a long time!
Exit criterion:
- Training job time (hours): 0.25 - this causes the experiment to end after a maximum of 15 minutes.
- Metric score threshold: 0.08 - this causes the experiment to end if a model achieves a normalized root mean square error metric score of 0.08 or less.
- o Featurization settings:
- Enable featurization: Selected - this causes Azure Machine Learning to automatically preprocess the features before training.

Fig.21 Creating an Automated ML page.

Fig.22 Selecting the dataset.

Fig.23 Configure Run.

Fig.24 Select Task Type (the model will predict a numeric value).

Fig.25 Additional Configurations.

Additional configuration settings:

· Primary metric: Select Normalized root mean square error (more about this metric later!)

· Explain best model: Selected - this option causes automated machine learning to calculate feature importance for the best model; making it possible to determine the influence of each feature on the predicted label.

· Blocked algorithms: Block all other than RandomForest and LightGBM - normally you'd want to try as many as possible, but doing so can take a long time!

· Exit criterion:

o Training job time (hours): 0.25 - this causes the experiment to end after a maximum of 15 minutes.

o Metric score threshold: 0.08 - this causes the experiment to end if a model achieves a normalized root mean square error metric score of 0.08 or less.

Fig.26 Featurization Settings.

Fig.27 Creating a new Automated ML.

Fig.28 Status Running.

Review the best model.

After the experiment has finished; you can review the best performing model that was generated (note that in this case, we used exit criteria to stop the experiment - so the "best" model found by the experiment may not be the best possible model, just the best one found within the time allowed for this exercise!).

On the Details tab of the automated machine learning run, note the best model summary.
Select the Algorithm name for the best model to view its details.

The best model is identified based on the evaluation metric you specified (Normalized root mean square error). To calculate this metric, the training process used some of the data to train the model, and applied a technique called cross-validation to iteratively test the trained model with data it wasn't trained with and compare the predicted value with the actual known value. The difference between the predicted and actual value (known as the residuals) indicates the amount of error in the model, and this particular performance metric is calculated by squaring the errors across all of the test cases, finding the mean of these squares, and then taking the square root. What all of this means is that smaller this value is, the more accurately the model is predicting.

Fig. 29 Details after Run.

Fig.30 Run Metrics.

The Model explanations below are used to understand what features are directly impacting the model and why.

Fug. 31 Model Explanation.

Select the Explanations tab, and view the Global Importance chart. This shows how much each feature in the dataset influences the label prediction, like this:

Fig.32 Chart Type.

Next to the Normalized root mean square error value, select View all other metrics to see values of other possible evaluation metrics for a regression model.
Select the Metrics tab and select the residuals and predicted true charts if they are not already selected. Then review the charts, which show the performance of the model by comparing the predicted values against the true values, and by showing the residuals (differences between predicted and actual values) as a histogram.

The Predicted vs. True chart should show a diagonal trend in which the predicted value correlates closely to the true value. A dotted line shows how a perfect model should perform, and the closer the line for your model's average predicted value is to this, the better its performance. A histogram below the line chart shows the distribution of true values.

Fig.33 Metrics.

The Residual Histogram shows the frequency of residual value ranges. Residuals represent variance between predicted and true values that can't be explained by the model - in other words, errors; so what you should hope to see is that the most frequently occurring residual values are clustered around 0 (in other words, most of the errors are small), with fewer errors at the extreme ends of the scale.

Deploy a predictive service

In Azure Machine Learning, you can deploy a service as an Azure Container Instances (ACI) or to an Azure Kubernetes Service (AKS) cluster. For production scenarios, an AKS deployment is recommended, for which you must create an inference cluster compute target. In this exercise, you'll use an ACI service, which is a suitable deployment target for testing, and does not require you to create an inference cluster.

In Azure Machine Learning studio , on the Automated Machine learning experiment and view the Details tab.

Select the algorithm name for the best model. Then, on the Model tab, use the Deploy button to deploy the model with the following settings:

1. Name: predict-rentals

2. Description: Predict cycle rentals

3. Compute type: ACI

4. Enable authentication: Selected.

Fig.34 Deploy a Model.

Fig.35 Deploy status completed.

In Azure Machine Learning studio, view the Endpoints page and select the predict-rentals real-time endpoint. Then select the Consume tab and note the following information there. You need this information to connect to your deployed service from a client application.

The REST endpoint for your service
the Primary Key for your service

Fig.36 Consume model.

Fig.37 Creating a new file.

Test the deployed service

Now that you've deployed a service, you can test it using some simple code.

1. With the Consume page for the predict-rentals service page open in your browser, open a new browser tab and open a second instance of Azure Machine Learning studio . Then in the new tab, view the Notebooks page (under Author).

2. In the Notebooks page, under My files, use the 🗋 button to create a new file with the following settings:

o File location: Users/your user name

o File name: Test-Bikes

o File type: Notebook

o Overwrite if already exists: Selected

3. When the new notebook has been created, ensure that the compute instance you created previously is selected in the Compute box, and that it has a status of Running.

4. Use the ≪ button to collapse the file explorer pane and give you more room to focus on the Test-Bikes.ipynb notebook tab.

5. In the rectangular cell that has been created in the notebook, paste the following code:

6. Switch to the browser tab containing the Consume page for the predict-rentals service, and copy the REST endpoint for your service. The switch back to the tab containing the notebook and paste the key into the code, replacing YOUR_ENDPOINT.

7. Switch to the browser tab containing the Consume page for the predict-rentals service, and copy the Primary Key for your service. The switch back to the tab containing the notebook and paste the key into the code, replacing YOUR_KEY.

8. Save the notebook, Then use the ▷ button next to the cell to run the code.

Verify that predicted number of rentals for each day in the five day period are returned.

Fig.38 Predictions for the next 5 days.

Summary

In this module, you explored machine learning and learned how to use the automated machine learning capability of Azure Machine Learning to train and deploy a predictive model.

The web service you created is hosted in an Azure Container Instance. If you don't intend to experiment with it further, you should delete the endpoint to avoid accruing unnecessary Azure charges. You should also stop the training cluster and compute instance resources until you need them again.