Sales Forecasting using Machine Learning

Updated: Jul 27, 2020


This experiment aims to create a predictive model for estimate the demand for bicycle rentals.


This experiment aims to demonstrate the process of building a regression model to forecast demand for bicycle rentals. We will use a data set to build and train our model.


The objective will be to predict the value of the variable cnt (count) that represents the number of bikes rented within a specific hour and whose range is from 1 to 977


In this case we are using Azure Machine Learning with R studio to write the scripts, to build and deploy the predict model, and Power Bi with R studio to create the dashboards. This approach intent to shows a general idea about the potential benefits for business using data intelligence via the powerful innovative technologies.

We wish to emphasis that this is a general idea for data science cases and these methods could be apply for any typical business situations to support making decisions.

The Big Data analytics Life cycle will be divided into the following stages:

In other post we have described with more details the importance of each steps above for better understanding.

The screenshot bellow represents the whole predict mode process resume, following the steps above:

So, now we will start to explain some important steps with more details bellow:

Defining the columns for correlation analysis

With these correlation analysis methods, we try to identify which variables are most important to use in the predictive model, variables with values close to 1 have a high correlation and variables close to -1 have negative correlation and the variables close to 0 have no correlation. How we can see bellow, the variable "hr" and the variable "temp", are the most important variable.

The next step is to identify the outliers, creating the box plot visualization to treat these outliers as a discrepant values, before continue the Life cycle.

These dashboards bellow will show us some interactive visualizations, signalizing discrepant values and an descriptive analysis using the Microsoft tools Power BI:

Bellow we can see the real-time dashboard case working, you can select the graphs by yourself to check any information. It is a small example how we can leave any analysis in the palm of your hands. These dashboards can be programmed to be up date on time.

How the main objective is to create a predict model with the best accurate and the minimum error, bellow we show the dashboard where we can identify the proximity of adherence of the model created with the observed data in the past with 94% of accurate and the error distribution that shows us that the error distribution is aleatory, avoiding overfitting model.

After all steps concluded before, we can test the predict model inserting new data or new data set importing file with the same format and fields we used to create the trained model, to reproduce the predict model to create predictions using Azure Machine Learning tool, as we can see bellow:


We can use the whole process to identify discrepancies, tendencies and standards that will allows us to create artificial insights and some benefits values as we can mention bellow:

  • Whip effects reduction during all process chains by improving business adherence with more assertive demand forecasts.

  • reduction of fixed and variable expenses

  • Better resizing of the workforce

  • Reduction of opportunity cost and idleness

  • Increased sales through better effective availability for previously lost demands

  • Better customer satisfaction, delivering the service on demand Greater proximity to customer needs and greater market penetration.

  • More effective marketing strategies, etc.

For more details, all files are available in our official repository bellow: