Compressing screenshot traffic between remote WebDriver / Selenium Grid and Java test code

Some Selenium tests (especially running in debug mode) require taking large number of screenshots (some require taking screenshots even before and after each step is performed). Taking into account that each screenshot might take up to 10 MB, and the test execution might be running in parallel in the cluster the overall traffic might be counted in gigabytes and tens of gigabytes. Here we’re going to talk about how we can optimize the traffic that is coming from a WebDriver or Selenium Grid endpoint to your client code (Java automated tests).

Disclaimer: I prepared all the examples in this article using my Linux desktop. While the most of the examples are platform-independent, some (like splitting command into several lines) are not.

A bit of theory - How is everything going when you take screenshot with Selenium

When you take a screenshot, the browser generates PNG file with the content of the page. The size of that file depends on several things such as the page size (that results in picture dimensions), how colorized the page is, how large the page areas of the same color are, etc. All that is considered by the file compression algorithm.

However, to transfer file data over HTTP, the file content is then encoded in BASE64 which adds approximately 1/3 of the original file size.

Is there the way to reduce the size transferred?

Yes there is. The request body can be compressed with one of the compression algorithms. There is the condition however you need to meet to succeed. Both the client and the server have to negotiate the compression type. They both have to support it. One of the most often used type is gzip.

Fortunately the client that is used by Selenium Java bindings support gzip compression. Unfortunately neither WebDrivers nor Selenium Grid hub support one.

Working around server-side gzip compression

One of the solutions is to introduce reverse proxy server that would be serving the connectivity to your remote hub/driver. Such proxy is to be deployed either on the same host where hub/driver is running or withing the common "fast" network shared with that hub/driver. The overall picture then would be looking like shown on the schema:

compressing selenium traffic

In this article we’re going to implement such concept. We’re going to use Nginx server to implement reverse proxy and compress traffic from Selenium hub/driver on the fly.

Brief steps description

So to demonstrate the concept we’ll prepare the "local cluster" with the help of Docker and official Selenium and Nginx docker images. We also build our own image derived from Nginx image that would include the configuration allowing us to compress Selenium traffic.

Since we’re going to use Docker, it will be really good if you have the idea what Docker is and even better have some hands-on experience with the tool. It is also the requirement to have Docker installed if you want to run the example from the article.

As the final steps we will check the result with curl tool and simple Selenium test.

Creating virtual network for our solution components

For the sake of demonstration we will prepare an isolated network space where our components will be running. The model we will implement is really very close to what people have in their distributed environments. Let’s add virtual network to our system. It is really simple:

docker network create grid

We have just added a virtual network of bridge type. We can refer to that new network using the name grid.

Bridge networks isolate components running within such networks from the host. You have to explicitly expose the ports to the outside if you want them to be reachable from your host machine. Bridge networks also provides domain name resolution mechanism to the containers so that you can access services using the container names rather than IP addresses.

Our next steps would be to prepare reverse proxy image, and run it and Selenium image within just created virtual network.

Prepare Nginx image with our custom configuration

Now we’re building the image that would be running reverse proxy which will be compressing traffic on the fly. Fortunately Nginx server allows to configure such proxy with few lines of configuration. Let’s create a separate folder where we will be keeping all the files needed to build an image. Change your directory to that new folder and add file named Dockerfile with the following content:

FROM nginx
RUN rm /etc/nginx/conf.d/default.conf
COPY nginx.conf /etc/nginx/

EXPOSE 4444

This file will later tell Docker that we take the default official image of Nginx server and add the configuration file that we do not yet have. Let’s add one to the same folder. Name it nginx.conf and put the following content to that new file:

user       nobody nogroup;

events {
}

http {

  gzip on;
  gzip_proxied    no-cache no-store private expired auth;
  gzip_types application/json;
  gzip_min_length 1000;
  gzip_comp_level 9;
  proxy_redirect          off;
  proxy_set_header        Host            $host;
  proxy_set_header        X-Real-IP       $remote_addr;
  proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
  client_max_body_size    10m;
  client_body_buffer_size 128k;
  proxy_connect_timeout   90;
  proxy_send_timeout      90;
  proxy_read_timeout      90;
  proxy_buffers           32 4k;

  server {
    listen       4444;

    location / {
      proxy_pass      http://selenium-hub:4444;
    }

  }

}

Now everything is ready. The proxy with such the configuration will be listening on the port 4444 and would forward all the requests (for all the paths) to http://selenium-hub:4444 which will be resolved to the container where our remote driver/hub is running.

Run the "cluster" and make some smoke test

First of all we need to build our custom Nginx image. When you are in that created folder, execute the command:

1) Build reverse proxy image

docker build -t nginx:grid .

The resulting output should end with the lines like these:

Successfully built bd4464953270
Successfully tagged nginx:grid

2) Start Selenium container

This means we can now start the reverse-proxy. However before we do that we need to start the official Selenium container that packages WebDriver, Selenium Grid and web browser. In my example I use image with Google Chrome standalone (which means grid is configured to support only a single session), but you may choose among Chrome, Firefox and Opera.

docker run --name selenium-hub --network grid --shm-size=2g selenium/standalone-chrome

This command starts an image within recently created network named as grid and the virtual host in that virtual network will get the name selenium-hub.

In a few seconds you should see the log message saying the service has started on port 4444. The port mentioned there is so called "container port" which means it is accessible only from other containers of the same virtual network (unless explicitly exposed).

3) Start reverse proxy container

Now open new terminal (or new command line). Start reverse proxy with the following command:

docker run -p 12345:4444 --network grid nginx:grid

This command starts our custom Nginx image within the virtual network the same as just started Selenium container. This command also exposes internal container port of Nginx (4444) to the host port 12345. You can now open your browser and navigate to http://localhost:12345/. Browser will display Selenium welcome message and the terminal where your proxy is running should log those requests.

4) Make a smoke test

Open new terminal/command line so that you have both reverse proxy log and your new terminal displayed. In your new terminal run the following command:

curl -v localhost:12345

Curl tool responds with some human readable page content. At the same time Nginx console shows the log line that contains the information about request type (GET), protocol used (HTTP 1.1), response code (200), content size (in my case it is 1672 bytes) and user-agent (in my case it is "curl/7.68.0").

Now run slightly modified command:

curl -v -H "Accept-Encoding: gzip" --output /dev/null  localhost:12345

The output of curl now showing new header came with server response: "Content-Encoding: gzip". It also shows that the data transferred with the response is now two times smaller. The same is confirmed by Nginx logs.

The above test shows that the compression works perfectly. Now we can move ahead and try to work with Selenium screenshots.

Demonstration of the effect

In order to demonstrate the effect of enabled compression, let’s perform some steps using curl tool (we’ll take low level steps which your test script would be converted to by your favorite Selenium bindings library).

Establish the session with web driver

All the interaction with the browser through the WebDriver are bound to a session object. To create one you need to invoke dedicated endpoint. When create a session, WebDriver starts a web browser.

curl -v -H "Content-Type: application/json" \
     -X POST \
     -d '{"desiredCapabilities": {},"capabilities": {"firstMatch": [{"browserName": "chrome"}]}}' \
     http://localhost:12345/wd/hub/session

Navigate to a web page

The previous command should output the session details. Take session id from there and use it instead of 88e2296bfa38e745be8d01f78a2399d3 in all the following commands:

curl -v -H "Content-Type: application/json" \
     -X POST \
     -d '{"url": "https://webelement.click/en/welcome"}' \
     http://localhost:12345/wd/hub/session/88e2296bfa38e745be8d01f78a2399d3/url

Take screenshot

Now we’re going to take a screen shot. The response won’t be compressed because we do not specify the expected compression method in dedicated header.

curl http://localhost:12345/wd/hub/session/88e2296bfa38e745be8d01f78a2399d3/screenshot > base64.scr

Observe your current folder. You should now see base64.scr file created. Open it for edit, remove {"value":" from the beginning and "} from the end. Save the changes.

Decode the content

This file is encoded with BASE64 encoding. If you work in Linux, you probably have base64 tool that can help you to decode the file. Otherwise you can use some online decoders.

base64 -d base64.scr > test.png

Observe your current folder. There is new PNG file there. Open it in any image viewer to make sure it is the properly formatted image file.

Take screenshot with compression

Let’s now add some compression to the response from Selenium WebDriver.

curl -v -H "Accept-Encoding: gzip" --output test.gz http://localhost:12345/wd/hub/session/88e2296bfa38e745be8d01f78a2399d3/screenshot

Now you should have one new file in your folder. That test.gz file contains compressed response of the same command as you were running before without compression.

Do not forget to release your session after you have done*

The command below will kill the WebDriver session and close the browser window.

curl -X DELETE http://localhost:12345/wd/hub/session/88e2296bfa38e745be8d01f78a2399d3

Understanding the results

Lets look at two files now: base64.scr and test.gz the former is 50% bigger than the latter one. This difference is your benefit. Basically you save 1/3 of your traffic by enabling that compression.

The payoff is that you need to set up reverse proxy on your hub / driver side, it also takes some computation resources to compress the traffic on the fly.

Simple trick to save some extra space

If you have colorful application and you do not care much of the colors in your screenshot you can get the benefit from converting your page to gray-scale palette. This is mostly useful for manual analysis of screenshot on debug phase since colors are often important when you test application styling, not functionality.

Let’s look at the test script written in Java language:

public static void main(String[] args) throws IOException, InterruptedException {
    WebDriver driver = null;
    try{
        ChromeOptions chromeOptions = new ChromeOptions();
        driver = new RemoteWebDriver(new URL("http://localhost:12345/wd/hub"), chromeOptions);
        driver.get("https://youtube.com");
        File screenshot = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);
        Files.copy(
                screenshot.toPath(),
                Path.of("/screenshots/wec_color.png")
        );
        ((JavascriptExecutor)driver)
                .executeScript("document.getElementsByTagName('body')[0].style['filter']='grayscale(1)';");
        screenshot = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);
        Files.copy(
                screenshot.toPath(),
                Path.of("/screenshots/wec_gray.png")
        );
    }catch(Exception e){
        e.printStackTrace();
    }finally {
        if(driver != null){
            driver.quit();
        }
    }
}

The script takes two screenshots of the identical content, but before taking the second one, it executes the following javascript against the target web page:

document.getElementsByTagName('body')[0].style['filter']='grayscale(1)';

Let’s now look at our Nginx logs:

POST /wd/hub/session HTTP/1.1 200 997 - selenium/3.141.59 (java unix)
POST /wd/hub/session/.../url HTTP/1.1 200 14 - selenium/3.141.59 (java unix)
GET /wd/hub/session/.../screenshot HTTP/1.1 200 700386 - selenium/3.141.59 (java unix)
POST /wd/hub/session/.../execute/sync HTTP/1.1 200 14 - selenium/3.141.59 (java unix)
GET /wd/hub/session/.../screenshot HTTP/1.1 200 457857 - selenium/3.141.59 (java unix)
DELETE /wd/hub/session/... HTTP/1.1 200 14 - selenium/3.141.59 (java unix)

These entries are low-level equivalent of our test scrip commands. Pay attention to two entries invoking screenshot endpoints. As we can see, the page converted to gray scale is 35% smaller than the same page in full color mode.

So we listed the most effective ways to optimize the traffic coming from WebDriver or Selenium Grid to your client test code. I really appreciate your feedback so feel free to share your questions and thoughts with me using this form.