What Selenium can and what Selenium cannot do

Today we’re going to talk about the very basic things and statements. If one is going to connect their professional life with web application test automation using #Selenium, then they have to have quite clear understanding of those basic things. And we’re going to start from what the most of the people dislike. Namely..

A short history

As the first implementation of the idea of automatic browser interaction, Selenium saw the light in far 2004. At the beginning it was some sort of JavaScript injection mechanism, but for the first public release introduced the conception that is much closer to what we have now (however still quite different though). That conception implied some "server-side" component which on the one hand was able to manage the browsers with injecting JavaScript instructions to those browsers, and on other hand it was exposing a sort of protocol, that allowed the test code to interact with that server-side component. That code could be written in a number of supported general-purpose programming languages (such as Java or C#). That project got the name Selenium Remote Control (RC). By the way, the set of libraries featuring the basic functionality for accessing that "server-component" of Selenium is sometimes called binding.

Then it was Selenium Grid. Pretty much the same as Selenium RC, but allowing a user to execute one test in parallel using different browsers on different computers. All that was achieved with only configuration changes of the grid. No special code support was required.

That was pretty convenient, however it had its flaws. For example, the browser vendors didn’t want (which is quite natural) to stand still. They released new versions of their products, and often after that Selenium developers had to spend their time to support those changes in Selenium code. More over the number of different browsers increased with time. In those circumstances the approach that was used to maintain integration of browsers and Selenium lost its efficiency.

In 2012 World Wide Web Consortium (W3C) - the community that develops standards in Web domain introduced the project of WebDriver specification. That specification was required in order to implement new model, following which, the browser vendors themselves would develop their own "server components". Because nobody knows how to manage their browsers in the most effective way except themselves. So such components got the name "web drivers". Starting from that time, Chrome developers had to deliver their proprietary driver, FF developers - their own, IE developers - their own. Each such component in turn had to comply WebDriver specification from W3C. That assured that all the bindings would work with such drivers with no errors.

What Selenium can do

Eventually from the history (that probably can be considered not that short - sorry about that) we’re moving to the declared topic of the article. So, what is Selenium capable to do? First of all it might depend on the WebDriver implementation you’re going to use. On how well does it conform the specification. Lets assume that all the specification statements are met. Below we’re listing the key Selenium capabilities:

  • Navigation: navigating to a given URL, obtaining current URL, going backward, going forward, refreshing the page, obtaining the page title.

  • Operating with so called command context (i.e. how the framework understands what is it to work now with, when there are several possible options): switching among the windows, switching among the frames, changing the position or the size of the window, closing the window.

  • Looking up elements using locators: looking up one element or collection of elements supporting locator types like XPath, CSS, etc.

  • Obtaining the state of element: checking if the element is selected, whether it is enabled or disabled, querying the value of element’s attribute, name of tag that the element represents, obtaining the inner text of the element, obtaining the viewport of the element (the area that is used to render the element)

  • Interacting with element: click the element, clear the field, send the sequence of keystrokes to the element

  • Interacting with the page as a document: obtaining the source code of a page, injecting custom JavaScript code and executing it in synchronous and asynchronous ways

  • Working with cookies: obtaining the value of a cookie, adding new cookie, deleting a cookie

  • Working with alerts: obtaining alert text message, accepting the message, dismissing the message

  • Working with screenshots: taking a screenshot of the entire page, taking a screenshot of a particular element on the page

What Selenium cannot do

It would be logical to assume that Selenium cannot do all those things which are not listed in the capabilities list. However the experience shows that there is a number of misconceptions which people have in their mind regarding the cases they wouldn’t be able to address with the help of only Selenium framework. What are they?

  • Selenium is no a test framework (omg, what??): Selenium is a tool that automates the interaction with browsers. Selenium itself does not have any functionality or tools for preparing, executing, finishing the tests, asserting the results. Selenium also cannot build reports with test execution summary etc. If you want to add all those features to your tests you will have to learn dedicated libraries and frameworks. Not Selenium unfortunately.

  • Selenium looses its strength on the border between the browser and OS: for example if you need to save some file from your web application to the hard drive. This is because that dialog is not controlled by the browser, but rather by operating system. In order to implement such the scenario you will have to use other libraries and frameworks. Not Selenium again. The same relates to printing pages, interacting with context menu, etc.

  • Selenium might become useless even in some cases when you are trying to interact with elements on a page: some browsers can render elements on the page which are not the part of DOM tree and which are not HTML-elements. For example if you install some plugins to IE, they can draw UI inside a page but it is not accessible with Selenium. The same relates to obsolete technologies like Java Applets or Adobe Flash, etc.

This is, I guess, that basic knowledge that one has to have in order to get clearer understanding of what Selenium is and what Selenium is NOT before starting getting more deep into the topic. As well as for making automated test implementation planning more reliable and effective. If you still have the questions, please feel free to contact me. I will try to extend the article with missed points leaning on your feedback.