What is Puppeteer and how is it useful

Puppeteer is a library which allows you to remotely control the chrome browser.

It means that puppeteer can do things with browser, that you do. Can you type a webpage address in address bar of chrome and open it? Puppeteer can do it to. See the script below.

const puppeteer = require('puppeteer');

const run = async () => {
  const browser = await puppeteer.launch({
    headless: false
  });
  const page = await browser.newPage();
  await page.goto("https://techdoma.in");
}

run();

Save this file as index.js

Run the script as follows

node index.js

It will open up a chrome instance and open the url https://techdoma.in in the browser.

The option headless will open the browser in normal mode (headless: false) or in headless mode (headless: true (default))

Taking screenshot or pdf of website

You can also take screenshot of the webpage using page.screenshot api. See the code below

const puppeteer = require('puppeteer');

const run = async () => {
  const browser = await puppeteer.launch({
    headless: false
  });
  const page = await browser.newPage();
  await page.goto("https://techdoma.in");
  await page.screenshot({
    path: 'techdomain.png'
  })
  await browser.close();
}

run();

This will take screenshot of the webpage in the file techdomain.png

You can also take pdf using the api page.pdf. Note that headless mode must be true for pdf to work.

const puppeteer = require('puppeteer');

const run = async () => {
  const browser = await puppeteer.launch({
    headless: true
  });
  const page = await browser.newPage();
  await page.goto("https://techdoma.in");
  await page.screenshot({
    path: 'techdomain.png'
  })
  await page.pdf({
    path: 'techdomain.pdf', 
    format: 'A4'
  })
  await browser.close();
}

run();

We can also use existing chrome browser as well.

const browser = await puppeteer.launch({
    headless: false,
    executablePath: "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
  });

Replace the executablePath with the path to Google chrome in your OS.

Real use cases with Puppeteer

One of the best examples of Puppeteer is to server side render the html. The gist of the solution is that we can render a webpage in the headless browser, get the html contents out of it, and return it.

The html content that we get contains the code after running javascript. This is helpful in SEO. We can render the javascript rich page using puppeteer and return the final html out of it.

page.content appi returns the rendered html after the javascript has run.

For example, let us create a page where content is inserted using javascript. The below file is called jspage.html. Server is running in port 8080 in localhost, so this file can be accessed using http://localhost:8080/jspage.html

<!DOCTYPE html>
<html>
<head>
</head>
<body>
  <script>
    document.querySelector("body").innerHTML = "JS Content inserted"  
  </script>
</body>
</html>

We can render this page using puppeteer and serve the final content.

const puppeteer = require('puppeteer');

const run = async () => {
  const browser = await puppeteer.launch({
    headless: false,
  });
  const page = await browser.newPage();
  await page.goto("http://localhost:8080/jspage.html");
  console.log(await page.content());
  await page.screenshot({
    path: 'jspage.png'
  })
  await browser.close();
}

run();

In the console, you will see the content of the page

<!DOCTYPE html><html><head>
</head>
<body>JS Content inserted

</body></html>

and you can see the screenshot in the image jspage.png as well.