JavaScript
-
How to Start a Splash Microservice for Scraping JavaScript Webpages in Google Cloud Run
Learn a simple way to build a production-ready Splash server in the Cloud In previous posts, we introduced how to start Splash locally using Docker. In production, we would want to put the Splash server somewhere all our scraping projects can access. Since Splash is Dockerized, a simple solution is to start a Splash microservice in… Continue reading
-
How to Use Splash with Proxies for Scraping JavaScript Webpages
Learn different ways to use proxies for web scraping with Splash In web scraping, we often need to use proxies because of IP blocking or geolocation-sensitive data which means the scraped data can change depending on the geolocations of the IPs making the requests. It is straightforward to specify proxies for regular web scraping. However, when… Continue reading
-
How to Scrape JavaScript Webpages with Splash, Requests, and lxml in Python
Learn a simple way to get data from JavaScript websites Splash is a JavaScript rendering service developed by Scrapinghub, the same company that develops the popular Scraping framework Scrapy. It is particularly useful for scraping web pages that are created by JavaScript frameworks like Angular, React, Vue, etc. This is challenging if not impossible with… Continue reading
-
Sort JavaScript/TypeScript Modules Uniformly and Automatically with Git Hooks and CI Pipelines
When your JavaScript/TypeScript project becomes bigger, you will have more and more modules imported. If you don’t sort the modules properly, they will look pretty messy and become difficult to read and maintain. However, if you sort them manually, it is actually a very cumbersome job. Worse still, if you work in a team, everyone… Continue reading
-
Create a Pre-commit Git Hook to Check and Fix Your JavaScript/TypeScript Code Automatically
Improve your code quality using ESLint, Prettier, lint-staged, and Husky It is important to have a uniform coding style in a project, especially for a JavaScript/TypeScript project. This is because JavaScript is a very flexible language, while TypeScript is less flexible in syntax but is still quite flexible in formatting. Imagine some part of the code… Continue reading
-
How to Use Selenium to Simulate Manual Actions for Website Tests in Python
Selenium is a tool commonly used to automate website tests in the browser. You can simulate all kinds of manual actions with Selenium. In this post, we will introduce some manual actions that are commonly used in website tests or web scraping. Preparation We need to install Selenium and some dependencies before getting started. We… Continue reading
-
How to scrape JavaScript webpages using Selenium in Python
Due to the increasing popularity of modern JavaScript frameworks such as React, Angular, and Vue, more and more websites are now built dynamically with JavaScript. This poses a challenge for web scraping because the HTML markup is not available in the source code. Therefore, we cannot scrape these JavaScript webpages directly and need to render… Continue reading
-
How to scrape JavaScript webpages using ProxyCrawl in Python
Due to the increasing popularity of modern JavaScript frameworks such as React, Angular, and Vue, more and more websites are now built dynamically with JavaScript. This poses a challenge for web scraping because the HTML markup is not available in the source code. Therefore, we cannot scrape these JavaScript webpages directly and need to render… Continue reading