Thursday, November 12, 2015

What can you do with code?

I recently started mentoring a local high school's FRC team. Even though the challenge hasn't been announced yet, the team has started putting back together last year's robot just to get in the rhythm. We are also trying to recruit more students for the software team, since those who programmed last year's robot will be graduating this year.

So I was tasked with getting these students familiar with the code. Now these are students who have had just a little intro to programming, either through previous involvement in a robotics team or through an Intro level programming course. I myself would have to spend some time with the code and the API and understand how it all works before I can guide them. I couldn't get my hands on the code before our first meeting, so instead I thought of showing them some other real life code and their applications. To make it fun, I showed them a snippet of the code first and had them try to guess what the application is.

Here are the 4 snippets of code and the applications (the slides are below as well):

1. I am a big fan of FPS games and I thought the students must have played some kinds of those games and it would be a good start to get them excited. So I included the Doom 3 source code as explained at http://fabiensanglard.net/doom3/index.php

2. I had to include the code that started the OSS revolution, so I included the starting point of linux kernel.

3. At this point, I didn't want the students to get overwhelmed to see that code can only be written by a team of very talented software programmers and takes years to write. So I included some code from the project that won the Astro Pi contest (http://astro-pi.org/competition/winners/). The code was written by students just like them and I explained to them what it does and that it would be sent to the International Space Station in an upcoming launch.

4. Lastly, I wanted to include something that would be fun to show that code doesn't always have to have world changing implications. I searched for some cool Raspberry Pi projects and found this: http://www.scottmadethis.net/interactive/beetbox/.

In the end, I told them that it would be great fun to work on this project as a team. In the last slide, I asked them to not think "What can you do with code?", but "What will you do with code?".


 



Friday, October 16, 2015

Test Automation Frameworks

In the test engineering team, one of the choices we have to make often is to pick the best automation tool for a particular project. In terms of automation frameworks, one size does not fit all. Each tool/framework has its own strengths, which make it suitable to be used for specific types of projects. In this blog post, I'd like to introduce to you (in no particular order) the various frameworks we use for test automation and the reasons they were chosen for that particular application.

Robot Framework

Robot Framework is a generic test automation/execution framework. It's generic in the sense that it doesn't provide any real automation capabilities itself, but has pluggable underlying libraries that can be used for most automation needs. There is a Selenium library for browser automation, Appium library for mobile and a Requests library for HTTP web services testing. The strengths of Robot Framework are in its ease of use, flexibility and extensibility. You can create your tests in easy to read text (or markdown or HTML) format and you can use the lower level keywords provided by the libraries to create higher level keywords that describe your application better. This is particularly beneficial for test engineers because it lowers the automation learning curve for other team members, like Product Managers and Developers. Test Engineers can provide the keywords in such a way that writing tests becomes easy and just a matter of picking up the right keywords and providing right arguments.
We use Robot Framework for some of our front end Rails projects. These projects have mostly design and text changes and are tested using Selenium library. Once the features of these applications are described appropriately in keywords, it's easy to create or update tests. For example, here is a test that opens our home page, clicks on a link and validates that the "Apply Now" button in present.

*** Test Cases ***
| Validate About Page
| | [Documentation]             | Validate content in 'About Payoff' page
| | [Tags]                      | about
| | Open Browser To Home Page
| | Click Element               | link=About Payoff
| | Title Should Be             | About Us \| Next-Generation Financial Services \| Payoff
| | Page should contain element | id=btn-apply-now-get-started
| | [Teardown]                  | Close Browser

Example Robot Framework Test Case

Capybara/Rspec

For testing our loan UI application we use Capybara, a web automation framework. We use it with Selenium WebDriver to run the tests in a real browser, but it can also be used with headless browsers like capybara-webkit or poltergeist. While Robot Framework works great for validating simpler UI applications, we need a more robust programming environment for testing loan application because of multiple steps and conditionals involved. We also have an external data source for this application, and it gets tedious to write data driven tests with multiple steps in Robot Framework.
Capybara has a nice DSL to write expressive UI tests. Another advantage of writing tests in Capybara is that we can use our internally developed gems (or external gems, if needed) to enhance the automated tests. For example, we use feature flags to disable/enable certain functionality in our applications and to test them, we might use these gems to turn the feature flags off or on, run the tests and then toggle them back to their original state.
Here's a snippet of a typical feature test written in Capybara:

feature 'Loan application flow' do
  scenario 'complete and submit an application for approval' do
    # homepage
    visit '/'
    # start application
    click_button('the-payoff-loan-apply-now')
    # loan amount
    expect(page).to have_content('How much do you owe on your credit cards?')
    fill_in('loan-amount', :with => '10000')
    click_button('linear-continue')
    # FICO score
    expect(page).to have_content('What\'s your FICO Score?')
    select('Excellent (720+)', :from => 'credit_score_bracket')
    click_button('linear-continue')
    # name
    expect(page).to have_content('What\'s your name?')
  end
end

Feature Test in Capybara

Karma/Jasmine

Karma is a javascript test runner for unit testing of javascript in web applications. The tests can be described in frameworks like Jasmine. It is primarily used by our front end developers. Karma integrates with our development workflow as the tests are included as part of the rails package and run every time a build runs in our CI environment.

Airborne/Rspec

Airborne is a gem that makes it easy to write webservice API tests using Rspec. The reasons for using this gem as opposed to others is to make the tests and validations as descriptive as possible. We also have some APIs that have a lot of endpoints or intermediate states to test, so it especially important to make it easy to add new tests. 
Here's a snippet of hypothetical tests for httpbin.org:

describe 'httpbin api service' do
  before(:all) do
    @url = 'http://httpbin.org'
  end
  it 'get returns original url in json response' do
    get @url + '/get'
    expect_json(url: 'http://httpbin.org/get')
    expect_json_sizes(args: 0)
  end
  it 'get with url params returns the params in json response' do
    get @url + '/get?spec=true'
    expect_json_sizes(args: 1)
    expect_json('args', spec: 'true')
  end
  it 'post returns data in json response' do
    post @url + '/post', { :spec => 'true' }
    expect_json('json', spec: 'true')
  end
end

API Tests using Airborne

TestNG

TestNG is a Java based test framework, which provides capabilities to write automated tests in Java. So for example, browser automation tests can be written using Selenium WebDriver's Java bindings, or for REST based services using libraries like rest-assured. We use TestNG for testing some of our internal REST services, specifically because TestNG makes it easy to use external data source such as CSV or Excel file using DataProviders. It also integrates nicely with build tools like maven and CI tools like Jenkins and Bamboo.
Overall, TestNG works well for automation projects in Java but to leverage our internal toolset and Ruby expertise, we are using Rspec + Capybara or Airborne for automation.

Galen Framework

We have recently started investing some time in Galen Framework for testing the UI/UX of our front-end applications. This framework promises to solve the widespread problem of testing the layout of web applications on multiple display resolutions. So far, the framework looks promising and we are continuing to automate the layout testing of our marketing websites. We plan on partnering with our UX designers so that they can use this tool to test the layout as they make changes to the applications.
Galen Framework uses Selenium as the underlying automation tool, and the layout specs are written in its own spec language. In the below layout spec, we are verifying some elements on our marketing home page on 2 different layouts (desktop and mobile) with specific display resolutions:

@objects
    how-payoff-works        id hdr-how-payoff-works
    rates-fees              id hdr-rates-and-fees
    about-payoff            id hdr-about-payoff
= Verify Marketing Landing page Elements =
    @on desktop
        how-payoff-works:
            text matches "How Payoff Works"
            width ~131px
            height ~70px
            aligned horizontally all how-payoff-works 2px
        menu-button:
            absent
    @on mobile
        how-payoff-works:
            text matches "How Payoff Works"
            width 345 to 360 px
            height ~40px
            aligned vertically all how-payoff-works 2px
        menu-button:
            text matches "Menu"
            width ~81px
            height ~40px
            aligned horizontally all login-mobile 2px

Galen Framework Layout Spec

Galen Framework can handle differences between a mobile or desktop screen, like cases where an element is visible in one and not the other. For example, the menu-button above is not visible on a desktop screen resolution and so, it can be specified to be absent on that.
The interaction with UI can be automated using Java (or Javascript) and as mentioned earlier, it uses Selenium WebDriver for that. Here's an example that use the above spec to validate the layout:


test("Desktop Landing Page Elements", function(){
    var driver = session.get("driver");
    marketingPage.waitForIt();
    checkLayout(driver, "./specs/marketingPage.gspec", "desktop");
});

Galen Framework Test

Appium

Appium is a mobile automation framework that allows writing tests using a common API for both iOS and Android platforms. Since it uses WebDriver protocol, tests are written similarly as browser tests on a desktop using WebDriver. Currently, we don't have a use case for native app testing so we are content with using Appium for testing our websites on mobile browsers. Once we do, we will explore other tools/frameworks like UIAutomator for Android, UI Automation for iOS or Calabash for both Android and iOS etc.

Appendix: Selenium WebDriver

I couldn't end this post on Test Automation frameworks without talking more about Selenium WebDriver, which is the underlying tool for browser automation. There was a time in distant history when Mercury's WinRunner and later, QuickTest Pro were the tools of choice for UI Automation. At that point, the term Continuous Integration hadn't been coined by Martin Fowler and the idea of Test Automation for most companies was to have some QA engineers write some scripts on the side and run them manually to make repeated, time consuming tests execute faster. QuickTest Pro was a commercial tool and cost quite a bit, AND it only supported Internet Explorer on Windows. Nevertheless, it was good for what it did, which was to reduce a bit of manual testing overhead for repeated regression testing. It also integrated well with Mercury's TestDirector or Quality Center for test management, so you could envision tying together requirements gathering, test planning, test execution and defect reporting within one tool.
But as IE lost ground to Firefox and Chrome and mobile browsing grew more popular, and as HP acquired Mercury, QuickTest Pro started falling behind. More and more teams wanted to adopt agile methodologies so HP's tools seemed very heavy to use and didn't fit in an agile environment. Around that time, Selenium started gaining popularity. It was open source, so test engineers could easily prototype an automation solution with it and convince their teams to start using it. Support for testing on different browsers was a huge bonus, as well as its easy integration with continuous integration tools. The open source community behind it continued improving it with support from companies like Mozilla, Google and ThoughtWorks. The ability to write automated tests in any of the popular languages and Selenium's integration with a number of frameworks like TestNG, Junit, Capybara, Cucumber, Watir etc. added to its appeal.
Paul Hammant wrote a blog post with his thoughts on why QTP was not relevant anymore. He has some graphs showing the popularity of Selenium vs. QTP on job site indeed.com. Those are a bit old (2011) so I pulled up a similar graph myself, and it is very telling:

Job Trends from Indeed.com - Relative Growth (UFT vs QTP vs quicktest vs Selenium)

Appendix: Apache JMeter

While this post is about functional automation, I want to briefly mention Apache JMeter, which is the tool we use for load/performance testing. In a future post, I'll describe how we use it and how it integrates in our continuous deployment process.

Thursday, June 25, 2015

What Erno Rubik can teach you about hiring

I recently came across this video of an interview of Erno Rubik, the inventor of Rubik's cube which possibly is the most popular puzzle/toy ever. The interview has fascinating insights on what led to the creation of the cube.


Apart from everything else he has to say, there are 2 things which really struck me. One, he says he's a "very ordinary man". Considering that he built something that has captured the imagination and has had such a positive influence on millions of people, it's truly inspiring to see how humble he is. Second, he says if there's any special thing about him, it's that he loves what he does.

Thinking about it, those are the 2 most important qualities we look for when interviewing people for our team. One, that they are humble. No matter what your accomplishments or skills, if you go around beating your chest about them, you're not going to be able to work cohesively in a team. We will not hire so called rockstars who can deliver 10 times more than a normal engineer, but have trouble having a conversation without berating or criticizing someone.

The 2nd quality - to love what you do - is just as important. It is what drives people to pursue mastery of a skill regardless of the end result. It is what makes you continuously improve yourself whether you're successful in the short term or not. And it is what makes good team players and helps deliver successful products.

In essence, there is a special talent of celebrating your skills and accomplishments in a humble way, which Erno Rubik portrays. And someone with that skill, and the skill of loving your work is always a pleasure to know and work with.



Tuesday, March 3, 2015

But it works on my machine!!

I've heard this phrase numerous times while testing and communicating an issue/bug to a developer: "But it works on my machine!". For some of them, it's the first thing they'd say, sometimes even before I finish describing the exact sequence of events. And you'd think that I would have learned to handle this situation gracefully by now, but I still have to resist the urge to smack them and drag them to my desk or wherever the tests ran, and show them the error.

Well...I wouldn't be writing this just for that. I recently faced this issue myself. I've been working on creating an MQTT keyword library for Robot Framework. This library provides keywords to publish/subscribe to an MQTT broker. Source code is here: https://github.com/randomsync/robotframework-mqttlibrary

One of the keywords that is a part of this library is 'unsubscribe'. This lets a durable client (one which subscribed with clean session set to false) unsubscribe from a topic so that it doesn't receive any further messages published to the broker on that particular topic. If the client doesn't unsubscribe and disconnects, the subscription is still valid and the broker will deliver all messages received when the client next reconnects.

A test for this keyword is:
Step 1. Connect, Subscribe and Unsubscribe from a topic with a durable client (Client A)
Step 2. Publish messages to the topic with a different client (Client B)
Step 3. Connect as Client A, Subscribe and ensure that messages published by Client B are NOT received.

I wrote the test using Robot Framework and it worked on my mac. To run these tests, I'm using a local mosquitto broker and also a public broker provided by eclipse project at: http://iot.eclipse.org. While running the tests from my mac on both local broker and the eclipse broker, it verified that after unsubscribing and reconnecting, no messages were delivered. I pushed the change.

I also have the project set to build on travis-ci.org: https://travis-ci.org/randomsync/robotframework-mqttlibrary.  To my dismay, that test failed on travis-ci. WTF? "But it works on machine!!"

Typically, unless there's something obvious that you overlooked, the only way to tackle these kind of issues is process of elimination. We try to account for differences between local vs. remote server and determine if any one, or a combination of those differences might be the culprit. Of course, in these kind of scenarios, it helps if the local machines that you build on are as similar to the build/deploy servers as possible. (At Amazon, all engineers are given a RHEL VM instance to develop on, which is what is used for production deployments as well)

In my case, differences were:
Local environment: Mac, Python 2.7.6, pip 1.5.6,
Travis build instance: Ubuntu 12.04, Python 2.7.9, pip 6.0.7

Other dependencies were installed through pip and *should* be the same:
paho-mqtt: 1.1
robotframework: 2.8.7

Target server iot.eclipse.org is running mosquitto version 1.3.1 and locally, I have version 1.3.5 running.

So the first thing I could eliminate easily was the broker. I ran the tests from my machine using iot.eclipse.org as the target and they passed. Still, I went through the release notes for mosquitto server to see if there were any changes between 1.3.1 and 1.3.5 that might provide a clue.

Next thing I looked into was to somehow re-create locally the VM instance travis uses so I can better debug, because not having access to any logs or the machine where the tests fail is a major hinderance. I found some helpful articles [1] [2] [3]. There's also an option to upload the build artifacts to S3 as described here.

At that time, I didn't get a chance to try any of these. Ideally and as I mentioned before, you should have a build environment easily accessible that is as close to production as possible. So long term, it will help in debugging build issues to have a local instance similar to what travis-ci uses. In this case, I found that the tests failed when running on a local windows platform as well. So that made it easier to debug.

One of the things I had a hunch about right from the start was that I was not waiting long enough for 'unsubscribe' to complete. What if I send a 'disconnect' very quickly before the broker even finishes processing the 'unsubscribe' packet. I was able to confirm this by adding a 1 second sleep after unsubscribing on the windows machine. After adding that, the tests passed.

Obviously adding sleeps is not the correct fix. Paho client's documentation suggests to use one of the 'loop*' functions: http://eclipse.org/paho/clients/python/docs/#network-loop. These allow you to wait and confirm that the message was sent or received. I had overlooked these before but I went ahead and added those to the connect and subscribe functions (still need to do that for publish, disconnect) and was able to verify that the unsubscribe test worked without the sleep.

Conclusion:
  1. Inconsistent test failures are the bane of test automation. They undermine the value provided by test automation. Follow these as best practices:
    1. Design robust automated tests. DO NOT add an automated test if it's not 100% reliable. I would much rather have 1 reliable test than 10 unreliable tests.
    2. Have a build environment available locally which is very similar (if not the same) as the one used by your CI hosts
  2. But, just because a test is failing inconsistently doesn't always mean it's a test issue. It can be a bug in the code, as seen above. It definitely helps if the test automation engineers know how the application is implemented and can look at and understand the code. Sometimes, just looking at the code gives you ideas on what kind of edge conditions to test for. Sometimes, you just get lucky and find an issue which may have been overlooked.
  3. I still don't know why the tests pass on a mac and fail on a windows/ubuntu (travis) machine consistently. Python version is different but I didn't get to evaluate that. Could there be a difference in how the network packets are sent/received in whatever libraries the 2 versions of Python are using? There's also a slight chance that there's a bug in some client/broker implementations if the tests fail inconsistently.
    Next steps:
    • Setup a virtualenv on mac so I can use different versions of python 
    • Setup a local image used by travis-ci