Wednesday, 9 January 2013

Launching Australia's Biggest News Site


For years I had witnessed that the engineering practices have not been up to par with what the rest of the industry was doing. As a development team we weren't focusing on quality and every thing that we did was very manual. This caused me great discomfort so I began preaching on the values of agile and the practices from extreme programming that I have had success with in the past. For a while I felt it was going nowhere until certain key people were hired that made that journey easier (you know who you are).

The journey began early 2011 as we were faced with a difficult decision. We had built an internal CMS that has met our needs for the last 10 or so years. However as with many software solutions, the years have not been kind to it. This solution was basically built with one purpose, to build web sites. However we were facing the dilemma of multiple devices. Development was done through a text area and the code was not version controlled. We unfortunately hit a brick wall with it's capabilities. The question arose do we rebuild it or do we look at paid product?

Previous attempts to rebuild the CMS had pretty much answered that question, so we decided to start searching for a product. We needed a product that was extensible and was built using .NET so we thought the best product for us would be Sitecore.

To be successful from an engineering point of view, we wanted to accomplish the following:
  1. Build a new website using a new CMS product.
  2. Introduce quality into the product using TDD and BDD
  3. Implement DevOps
This was a very ambitious goal and a massive transformation.

We already were using Scrum within other projects, however we all agreed that this was not Scrum, more like mini waterfall. We were building things in sprints and releasing months later (due to technical and process issues). So we began to plan our first iteration.

Sprint Zero

People often underestimate the importance of sprint zero. We were one of these people. Don't get me wrong it's not like we just jumped into it. Our sprint zero looked like this:
  1. Get a backlog of features that we needed to build.
  2. Set up a continuous integration environment along with version control.
  3. Set up a source code structure.
What was the most important step we left out? We had not tackled the problem of releasing our software. The issue that we had was that we really didn't know where our solution would end up and we also knew that a feature would not be complete any time soon, so we left it. The lesson for me was, it is important to set up a process that will deploy software. As code that is sitting in version control is wasted.

Process

We started off with Scrum and implemented the following routine:
  1. Two week sprints
  2. Two hour planning meeting at the beginning using planning poker
  3. Sprint Review Meeting
  4. Sprint Retrospective
Our estimation was done using T-Shirt sizes and converted to points to track. We tracked our progress using burn-up charts. This approach measures the estimated value of points left vs actual points being delivered. To me this gives a false sense of progress. I think a better way to measure is how many stories you are doing in an iteration vs how many stories are left.

Our board was modelled around the Lean approach of mapping the value stream. We were also using LeanKit to measure Lead and Cycle Time. We later dropped LeanKit to use Jira with Greenhopper, however this turned to be more complicated that just using LeanKit. I enjoyed measuring these times as I found that having a conversation around statistics at your retrospective gives you a great way to talk about continuous improvement.

So what was the end result? Well we started using more of a lean approach using Scrumban. We had weekly sprints and we didn't have long planning meetings. We would try to raise issues as they appeared and put it on the board to remind us to tackle it and we would still have retrospectives every few weeks.

Story Writing/Use Cases

We were all big fans of User Stories Applied. So we started to write stories in the following format

As a <role>, I want <goal/desire> so that <benefit>

During the story writing session we realised that the last part of the story was often left out and Mike Cohn says that it is optional. I started to disagree, if you can't explain the reason is it really worth it? I think this format would have been better

In order to <receive benefit> as a <role>, I want <goal/desire>

This format puts the benefit first which I think it is important. We all know that a story is a promise for a conversation, however this was not always the case and conversations were left out.

The acceptance criteria was captured using gherkin language. Unfortunately this was easier said than done, as it was new and people in general find it hard to describe the system. Due to this the development team started writing the acceptance criteria. Unfortunately to me this felt completely wrong. The reason is as follows:
  1. Usually the rest of the team would not review the criteria (product owner, scrum master)
  2. The language would turn into developer speak, which means the rest of the team would not get it.
If the development team is going to write the specs I find it easier to follow your trusted way of testing as the overhead of the gherkin language is not worth it. One needs to remember that BDD is a collaborative approach.

Testing

As I mentioned above we held quality very high as a practice so we decided to have the following levels of testing:
  1. Unit Testing
  2. Integration Testing
  3. Acceptance Testing
We had some challenges with the testing:
  1. Unit testing is hard to achieve using Sitecore. So we had to create a thin abstraction around it.
  2. All of Sitecore is driven around configuration file (roughly 4000 lines). We had to customise the config file to be able to do integration testing. This proved that we weren't really testing the same system. Due to this we had to create a matrix of the parts that we could test and the parts that we should avoid, which is not ideal.
  3. Due to these complications we started writing more acceptance tests as Sitecore needed a full HTTP context to be present.
  4. Acceptance tests were sometimes written to hit an API and sometimes to hit the UI. This proved to be complicated as we now had tests that looked like integration tests so the line was blurred.
These challenges proved costly to us. The biggest cost is that to be able to release a change the build takes over an hour and some of our tests are flaky.

The lesson here is to always pay attention to the testing pyramid.

DevOps

This area of our journey proved to be the most difficult. The reason I say this is that culture needs to change for DevOps to be successful.

We are all big fans of Continuous Delivery so we wanted to make sure that we made this a reality. The biggest challenge that we faced was actually having somewhere to deploy our application to. Long story short we fought with some hosting providers and decided that to really embrace continuous delivery we needed to go to the cloud.

Here is a summary of the tools that we used to make DevOps a reality:
  1. Continuos Integration started with Bamboo, however this moved to Team City
  2. All the scripting was done using PowerShell. The reason for this is that our application is based on .NET and being deployed on windows. In hindsight writing PowerShell is easy, however maintaining a large codebase with tests and specs is very primitive. Looking forward we would use Ruby.
  3. We implemented our own package manager that is based on nuget. The package manager code is based of the following code. Moving forward I would consider to use Puppet
  4. All of our instances are prepared using Puppet and the initial infrastructure is set up using CloudFormation.
As mentioned previously our biggest challenge was to bring the two teams together. Some of the things that I found were as follows:
  1. Not everyone in operations believes they should be doing infrastructure as code.
  2. Not everyone in the development team cares about infrastructure as code and understanding infrastructure.
Due to the above reasons it is tempting for organisations to create a DevOps team. I don't agree with this as it is important for the organisation to come up with their definition of DevOps. If you know you don't have the right fit then go ahead and find it!

Conclusion

We had a great journey and have learnt so much about Agile, Continuous Delivery and DevOps. It wasn't always a smooth ride however we were able to pull it off and have lots of things that we can improve. Some areas that we want to concentrate on are:
  1. Design for failure.
  2. Implement Minimum viable product to justify the build of a feature.
  3. Implement a self healing system.
  4. Puppetise all of our infrastructure.