Web Scraping

Updated: Sep 1, 2020

Summary Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. The web scraping tool has numerous uses, so in this article we will present two applications of this tool, allowing the reader to understand its application and usefulness. For the first application, we will use the example applied by the New York Times and the second application will be of business use for collecting automated data to support business research.

Introduction Many Americans have gotten used to President Trump's lies. Then we organize almost all the lies he has publicly told since he took the oath of office and then we add to this list, taking it until November 11th, and provide links to the facts in each case. Objective

The purpose of this research is to assess all of President Trump's lies in the evolution of time. Methodology For this analysis we will use the programming language R to collect data from the The New York Times article and build a dataset that allows us to perform statistical analysis with Power BI tool, where we is possible to create different dash boards, whatever you want, for different proposes.

The figure bellow shows us that the R and Python are the most useful data programming language in the world, which reinforce the idea that we use the most reliable technology to try to find the best way for data drive decision.

The figure bellow shows the original format from the New York Times website, where we did the web scrapping.

Te figure bellow shows the result of the web scrapping, where we tabulated in a dataset the information above, using R program language.

The figure Bellow we show the hot calendar, where is possible to see the lies during the year.

The graph is interactive, as you pass your mouse above the graph, you will be able to see the information with more details. We would to remember that all technologies in this article allows you to see any information, anytime, anywhere through your website considering your data privacy policy and all data could be update anytime or in real time.

Conclusion The purpose of this article is only to present some potential technology tools that can be applied to any business reality, therefore, the approach does not have a purpose of deep analysis on the topic addressed here, since each business has its particularities and different purposes . Therefore, through this approach, we have evidenced that it was possible to collect data from the web, organize it in a table and present the information in a summarized and visible form through dashboards, which allows a more understandable visualization of the data, thus enabling the generation of insights for decision making.

In addition, we will present another typical drag and drop toll that can be used for web scrapping and that is part of the technologies that we are going to use to apply for the next case bellow.

According to Eureka research bellow, Power BI is one of the most power full Microsoft toll for Business Intelligence. In addition, can be used with the most power full data programming language R and Python. It can improve and extend the horizons for Business Analysis and visualization, specifically when we consider aspects as real time and anywhere features.

Therefore, bellow we will demonstrate another web scraping process just using drop and drag features.


The next objective is to conduct a research about prices for a specific product using web scrapping, just to know the functionality and to be sure how it can be used for business situations during the process to deploy a pricing and mix products strategy.

How we can see above, the power BI has the features that allows you to connect the web site market to research prices, you just need to use drag and drop features to choose the first information you are looking for and then, the Power BI will select the rest of the list in the web site automatically using machine learning procedures.

It allows us to import any data from web, in format table that can be used to apply in Business Intelligence process.

For this case for example, we can calculate the medium prices market for a specific product to support the marketing campaign, we can use this information to remodeling the mix products, etc

The Web Scrapping Benefits

So here are some examples of how web scraping is used:

  • Market research - gather valuable data about the market, competitors, target audience and its habits, data about social networks and the ways to use it for the advantage, etc.;

  • Price scraping - companies scrape prices so they could know what is the cost of the same products provided by the other companies, retail sites, etc. so they could put the best possible price in the market and attract more customers;

  • SEO - everyone who works with online business knows that SEO matters. Web scraping can help to gather key data to improve SEO and have a better engagement as well as to rank in the top of search engines;

  • Sales intelligence - sales intelligence refers to a wide range of technologies that help salespeople find, monitor, and understand information on prospects’ and existing clients’ daily business. Web scraping helps to gather useful data for this purpose and make necessary changes;

For more details, all files are available in our official repository bellow: