How to set up a proxy for Puppeteer
Article provides helpful information for anyone looking to use proxies with Puppeteer, and offers some good advice for choosing the right approach.
If you are using Puppeteer for web scraping, it might be better to use the Crawlee framework, which simplifies the spider development process and includes a set of tools for working with proxies.
Puppeteer proxy example
import puppeteer from 'puppeteer';
(async () => {
const browser = await puppeteer.launch();
const context = await browser.createIncognitoBrowserContext({
proxyServer: "http://127.0.0.1:8080"
});
const page = await context.newPage();
await page.goto('https://example.com/');
console.log(await page.content());
await browser.close();
})();
In the example above, please replace
http://127.0.0.1:8080
with the URL of your proxy server. If you don't have one, you may want to consider checking out a Proxy Port provider package:Install package:
$ npm i @proxyport/proxyport
Instantiate
ProxyPort
and call getProxy()
method:import { ProxyPort } from '@proxyport/proxyport';
const proxyPort = new ProxyPort(<API_KEY>);
(async () => {
const proxy = await proxyPort.getProxy();
console.log(proxy);
})();
If you don't have an API key yet, don't worry! Here are some instructions on how to get one for free.
After following the previous steps, you should have code that looks like this:
import puppeteer from 'puppeteer';
import { ProxyPort } from '@proxyport/proxyport';
(async () => {
const proxyPort = new ProxyPort(<API_KEY>);
const proxy = await proxyPort.getProxy();
const browser = await puppeteer.launch();
const context = await browser.createIncognitoBrowserContext({
proxyServer: `http://${proxy}`
});
const page = await context.newPage();
await page.goto('https://example.com/');
console.log(await page.content());
await browser.close();
})();
In some cases, you may encounter an error like this:
Error: net::ERR_TIMED_OUT at https://example.com/
This occurs because you're using free public proxy servers. To handle this error, you can use an approach with retries. Here's how you can accomplish that:
import puppeteer from 'puppeteer';
import { ProxyPort } from '@proxyport/proxyport';
(async () => {
const proxyPort = new ProxyPort(<API_KEY>);
const browser = await puppeteer.launch();
for (let i = 0; i < 10; i++) {
const proxy = await proxyPort.getProxy();
const context = await browser.createIncognitoBrowserContext({
proxyServer: `http://${proxy}`
});
const page = await context.newPage();
try {
await page.goto('https://example.com/');
console.log(await page.content());
break;
} catch (e) {
console.log(`failed to load page with proxy: ${proxy}, error: ${e}\n`);
}
}
await browser.close();
})();
If you require intensive proxy rotation with Puppeteer, you may want to consider checking out the Crawlee framework, which includes anti-blocking features with proxy rotation as well as support for Puppeteer and Playwright.