Alex Falkowski: 2012

Tuesday 11 December 2012

Is NuGet really open source?

So about 6 months ago I submitted a patch to try to fix the out of memory exception that we were getting with NuGet, due to having large packages.

The reason I discovered this was because I wrote a tool that emulated apt-get using NuGet that is on my github. I really saw the potential with using nuget for .NET projects.

Unfortunately my patch got rejected after a 6 month code review with the following message:

To update everyone on this issue, I'm planning to reduce the overall usage of MemoryStream in NuGet.Core. In many cases, we load the package contents into memory unnecessarily, hence the OOM exception. It will address the issue better than trying to band-aid it with writing to a temporary file, which could potentially affect perf. I'm sorry to decline the pull request, but rest assured that the OOM exception will get fixed in 2.3.

So why the disappointment? Well rather than work in a open source way Microsoft decides that they can do it better. It is just sad that this software giant does not understand OSS.

Friday 24 August 2012

DevOps – Misunderstandings

The DevOps movement has been an interesting and confusing term in our industry. I thought I would write about this because I had an interesting conversation at work and it seems that we all had different ideas of what this lovely word means.

Lets look at how Wikipedia defines it

DevOps is a software development method that stresses communication, collaboration and integration between software developers and information technology professionals. DevOps is a response to the interdependence of software development and IT operations. It aims to help an organisation rapidly produce software products and services

So what does this mean?

DevOps is purely a culture, a way of thinking and interacting.

The biggest confusion that I saw with my peers is that DevOps is a team that needs to be formed as a separate entity. This is wrong!

DevOps is collaboration between people from the development, operations and testing. This gathering is controlled purely by the nature of a self organising team. They don't report to a DevOps manager.

Another misconception that I noticed was that Infrastructure As Code is DevOps. This is wrong!

Infrastructure As Code came from the understanding that people were tired of having Snowflake Infrastructure and a lot of these problems have been solved in the development world. This really means that we are being pragmatic and solving our own problems by learning from each other. Infrastructure As Code is the key to implementing DevOps concepts.

Hopefully this clarifies some of the misunderstandings, you can find further information about DevOps can be found here.

Saturday 18 August 2012

Functional Programming in PowerShell

Functional Programming in PowerShell

A while back I got introduced to the concept of functional programming and like most of you out there I found it hard to understand some of the concepts. Over time I have slowly started to understand some of the concepts and thought I would try to once for all capture them as I have a memory of a goldfish. Please note that it is important to try to understand these concepts if you are going to understand this blog post.

If you would like to learn more about it I suggest this great book titled Real-World Functional Programming.

Functional Programming is defined as:

A programming paradigm that treats computation as the evaluation of mathematical functions and avoids state and mutable data. It emphasises the application of functions, in contrast to the imperative programming style, which emphasises changes in state.

Functional Programming is defined as using a style called Declarative programming rather than your traditional Imperative programming. What do these mean styles mean?

Declarative Programming is defined as:

A programming paradigm that expresses the logic of a computation without describing its control flow. Many languages applying this style attempt to minimise or eliminate side effects by describing what the program should accomplish, rather than describing how to go about accomplishing it.

Imperative Programming is defined as:

In computer science, imperative programming is a programming paradigm that describes computation in terms of statements that change a program state. In much the same way that imperative mood in natural languages expresses commands to take action, imperative programs define sequences of commands for the computer to perform.

The following points are what some of the functional programming languages exhibits:

Immutable Object
Higher Order Functions
Currying
Lazy Evaluation
Continuations
Pattern Matching
Closures

Luckily for us is that

PowerShell is motivated by functional programming

Right so lets go through these points and how they apply to PowerShell.

Immutable Object

An immutable object is an object whose state cannot be modified after it is created. This is accomplished as follows:

Describe "Immutable Object" {
    It "Should create immutable object" {
        $immutable = New-ImmutableObject @{ Name = "test"}
        $immutable.Name.Should.Be("test")

        try {
           $immutable.Name = "test1"
           $false.Should.Be($true)
        }
        catch [System.Management.Automation.SetValueException] {
            $true.Should.Be($true)
        }
    }
}

function New-ImmutableObject($object) {
    $immutable = New-Object PSObject

    $object.Keys | %{ 
        $value = $object[$_]
        $closure = { $value }.GetNewClosure()
        $immutable | Add-Member -name $_ -memberType ScriptProperty -value $closure
    }
    
    return $immutable
}

By adding a property that is of type ScriptProperty it generates a getter. When we try to modify it it will throw a System.Management.Automation.SetValueException. The example uses a Hashtable as the input, however it is quite easy to extend it to use any type of object.

Higher Order Functions

A higher-order function is a function that does at least one of the following:

Take one or more functions as an input
Output a function

So lets look at a simple test that will work out even and odd values in an array:

Describe "Higher Order Functions" {
    $values = @(1, 2, 3, 4)

    It "Should filter even" {
        $evenPredicate = { param($value) return $value % 2 -eq 0 }
        $even = Convert-ByFilter $values $evenPredicate
        $even.Length.Should.Be(2)
        $even[0].Should.Be(2)
        $even[1].Should.Be(4)
    }

    It "Should filter odd" {
        $oddPredicate = { param($value) return $value % 2 -eq 1 }
        $odd = Convert-ByFilter $values $oddPredicate
        $odd.Length.Should.Be(2)
        $odd[0].Should.Be(1)
        $odd[1].Should.Be(3)
    }
}

function Convert-ByFilter($values, $predicate) {
    return $values | where { & $predicate $_ }
}

PowerShell has a concept of a script block. The important thing to notice is that we are using a where expression and we are invoking the predicate with an & (this took me a while to work out).

Currying

Currying is the technique of transforming a function that takes multiple arguments (or an n-tuple of arguments) in such a way that it can be called as a chain of functions each with a single argument.

This has proven to be a bit difficult in PowerShell however it is not impossible. Let's look at the example below:

Describe "Currying" {
    It "Should add two values via annonymous functions" {
        $add = { param($x) return { param($y) return $x + $y }.GetNewClosure() }
        $addFive = & $add 5
        $ten = & $addFive 5
        $ten.Should.Be(10)

        $ten = & (& $add 5) 5
        $ten.Should.Be(10)
    }

    It "Should add two values via named functions" {
        function add($x) { return { param($y) return $y + $x }.GetNewClosure() }
        $addFive = add 5

        $ten = & $addFive 5
        $ten.Should.Be(10)

        $ten = & (add 5) 5
        $ten.Should.Be(10)
    }
}

As we can see, a curried function can be created as an anonymous function or a named one. I unfortunately could not find a clean way to represent this as a generic solution. The trick here is to always return a new closure. There is a JavaScript implementation that might be able to be used in PowerShell that maybe someone could help me with.

Lazy Evaluation

Lazy Evaluation is an evaluation strategy which delays the evaluation of an expression until its value is needed and which also avoids repeated evaluations.

This fortunately is easy to create as the wonderful people in the .NET framework created the class System.Lazy. Lets have a look at some PowerShell code:

Describe "Lazy" {
    $lazy = New-Lazy { return "test" }

    It "Should not have a value evaluated" {
        $lazy.IsValueCreated.Should.Be($false)
    }

    It "Should get lazy value" {
        $lazy.Value.Should.Be("test")
        $lazy.IsValueCreated.Should.Be($true)
    }
}

function New-Lazy($script) {
    $function = [System.Func[object]] $script
    $lazy = New-Object System.Lazy[object] $function

    return $lazy
}

All we need to do is create a Lazy object and call the Value property. The implementation will cache the value.

Continuations

A continuation is an abstract representation of the control state of a computer program. What does that mean?

This to me was complicated to understand why would you want such complication. I found a great explanation in the following article of how you can use continuations in a real world sense (nothing worst than thinking in abstract terms)

PowerShell does not have a feature like this (neither does .NET for that matter). The closest thing that PowerShell has is what is called Workflows. To look at how these are implemented I will need another blog post.

The other thing that we can look into is Continuation-passing style, which is described as:

A function written in continuation-passing style takes as an extra argument: an explicit "continuation" i.e. a function of one argument. When the CPS function has computed its result value, it "returns" it by calling the continuation function with this value as the argument.

This can be achieved with Task Parallel Library (TPL) or Async/Await, which neither work in PowerShell.

Pattern Matching

Pattern matching is the act of checking some sequence of tokens for the presence of the constituents of some pattern. Pattern matching in PowerShell is accomplished using the switch statement.

Using your traditional imperative programming languages the switch statement is considered a code smell. The reason that they have been frowned upon it is simply because they usually display signs of code duplication.

Pattern matching is encouraged in functional programming languages because they are much more powerful and functions are first class citizens.

Luckily for us PowerShell has a strong pattern matching construct in the switch statement. Lets look at some examples:

Describe "Pattern Matching" {
    It "Should use a simple switch" {
        $a = 5
        $result = switch ($a) { 
            1 {"The colour is red."} 
            2 {"The colour is blue."} 
            3 {"The colour is green."} 
            4 {"The colour is yellow."} 
            5 {"The colour is orange."} 
            6 {"The colour is purple."} 
            7 {"The colour is pink."}
            8 {"The colour is brown."} 
            default {"The colour could not be determined."}
        }

        $result.Should.Be("The colour is orange.") 
    }

    It "Should use a wildcard switch" {
        $a = "d14151"

        $result = switch -wildcard ($a) { 
            "a*" {"The colour is red."} 
            "b*" {"The colour is blue."} 
            "c*" {"The colour is green."} 
            "d*" {"The colour is yellow."} 
            "e*" {"The colour is orange."} 
            "f*" {"The colour is purple."} 
            "g*" {"The colour is pink."}
            "h*" {"The colour is brown."} 
            default {"The colour could not be determined."}
        }

        $result.Should.Be("The colour is yellow.") 
    }

    
It "Should use a regex switch" {
        $a = "r14151"

        $result = switch -regex ($a) { 
            "[a-d]" {"The colour is red."} 
            "[e-g]" {"The colour is blue."} 
            "[h-k]" {"The colour is green."} 
            "[l-o]" {"The colour is yellow."} 
            "[p-s]" {"The colour is orange."} 
            "[t-v]" {"The colour is purple."} 
            "[w-y]" {"The colour is pink."}
            "[z]" {"The colour is brown."} 
            default {"The colour could not be determined."}
        }

        $result.Should.Be("The colour is orange.") 
    }
}

As we can see this is quite powerful as the switch statement returns an expression.

Closures

A closure (also lexical closure or function closure) is a function together with a referencing environment for the non-local variables of that function.

Closures in PowerShell are created with the function GetNewClosure.

The new script block is closed over the local variables in the scope that the closure is defined in. In other words, the current values of the local variables are captured and enclosed inside the script block bound to the module.

You have seen this in the previous examples.

Hopefully this has been an interesting journey through some of the functional programming concepts and how they can be applied to PowerShell. I will try to build a module that makes it easier to use some of theses constructs.

Wednesday 8 August 2012

PowerShell - Did You Know - Scopes

You cannot depend on your eyes when your imagination is out of focus. - Mark Twain

I am a big fan of PowerShell, this is a rare occasion that Microsoft gets something right. If you want to learn more about PowerShell I suggest you buy this really great book or follow this blog. I thought I would start a series that outline some interesting and wonderful things I find about the language.

Today one of my colleges at work asked a very wonderful question about scopes, which left me puzzled (he is a JavaScript guru so I didn't understand why he didn't get it ;)

If you read through the official documentation about scopes you might get a bit confused like I did.

Local:

The current scope. The local scope can be the global

scope or any other scope. -- Hmmm

As we can see from the documentation PowerShell does not mention a concept of function scope like in JavaScript. If we take the JavaScript example and turn it into PowerShell we might get a surprise

$sport = "baseball"
$player = $null

function Get-Player() {
    if ($sport -eq "baseball") {  
        $player = "Evan Longoria"; # (the baseball player)  
    } 
    else {  
        $player = "Eva Longoria"; # (the actress)  
    }

    $player2 = "Derek Jeter";  
    return $player;
}

Get-Player
Write-Host $player

The $player variable is still $null. Not what I expected. However if we define a function within a function then the variable is available in the inner scope.

$sport = "baseball"
$player = $null

function Get-Player() {
    if ($sport -eq "baseball") {  
        $player = "Evan Longoria"; # (the baseball player)  
    } 
    else {  
        $player = "Eva Longoria"; # (the actress)  
    }

    $player2 = "Derek Jeter";  

    function Inner-Function {
        Write-Host $player
        Write-Host $player2
    }

    Inner-Function

    return $player;
}

Get-Player

If we prefix the $player2 variable with $script it is now in scope.

$sport = "baseball"
$player = $null

function Get-Player() {
    if ($sport -eq "baseball") {  
        $player = "Evan Longoria"; # (the baseball player)  
    } 
    else {  
        $player = "Eva Longoria"; # (the actress)  
    }

    $script:player2 = "Derek Jeter";  
    return $player;
}

Get-Player
Write-Host $player2

Another interesting example is

$sport = "baseball"
$player = $null

function Get-Player() {
    if ($sport -eq "baseball") {  
        $player = "Evan Longoria"; # (the baseball player)  
    } 
    else {  
        $player = "Eva Longoria"; # (the actress)  
    }

    $player2 = "Derek Jeter";

    Print-Player
    
    return $player;
}

function Print-Player {
    Write-Host $player2
}

Get-Player

The variable $player2 is defined in Print-Player.

I hope that you find this as interesting as I have. Make sure you tune in to the next episode of PowerShell - Did You Know

Tuesday 7 August 2012

Release It!: Design and Deploy Production-Ready Software

"A wise man learns by the mistakes of others, a fool by his own." - Latin Proverb

This book teaches you that “Feature complete” is not the same as “production ready”.

The summary at the end of the book sums up exactly what I feel about software development

Change is the defining characteristic of software. That change – that adaptation – begins with release. Release is the beginning of the software's true life; everything before the release is gestation.

This book shows you how to really think about the design decisions that you make in any project.

Release 1.0 is the beginning of your software's life. Your quality of life after 1.0 depends on choices you make long before that vital milestone.

The reality of most software is that we mean well when we build it, however we often get this wrong.

New software emerges like a new university student, full of optimistic vigour, suddenly facing the harsh realities of the world outside the development environment. Things happen in the real world that just don't happen during development, usually bad things.

The book is broken down into four sections:

Stability
Capacity
General Design Issues
Operations

Stability

The book defines stability as follows:

A stable system is one that keeps processing transactions, even when there are transient impulses, persistent stresses, or component failures disrupting normal processing.

An impulse is a rapid shock to the system and stress is a force applied to the system over an extended period.

Sudden impulses and excessive strain can trigger catastrophic failure. These failures can expose cracks in the system. These cracks are called failure modes. When a catastrophic failure occurs there is always chain of failure that caused it. One has to realise that the events are not independent as there is always a layer of coupling.

Some interesting Stability Anti-patterns:

Blocked threads – Don't hold onto resources.
Attacks of Self-Denial – kicking yourself while you are down.
Unbalanced capacities – Make sure that all systems can take each others load.
Slow Responses – These usually result from excessive demand.

To combat these here are some interesting Stability patterns:

Use Time-outs – Now and forever networks will always be unreliable, well-placed time-outs provide fault isolation.
Circuit Breaker – Protect your system from all manner of integration points problems. If the integration point is down stop calling it!
Fail Fast – If your system can't meet it's SLA inform callers quickly. Check resource availability at the start of the transaction.
Test Harness – Make sure that you simulate real-world failure modes. A great system for this is the recent is Chaos Monkey

By understanding the stability anti-patterns and the stability patterns described in the book one can prevent these cracks propagating through our layers in the system.

Capacity

The book defines capacity as follows:

Capacity is the maximum throughput a system can sustain, for a given workload, while maintaining an acceptable response time for each individual transaction.

Throughput describes the number of transactions the system can process in a given time span.

The hardest thing about dealing with capacity is working with non linear effects. In every system, exactly one constraint determines the system's capacity. It is important to understand these constraints.

Along with capacity comes some myths. I really never thought about these:

CPU is cheap – In reality 250 milliseconds per transaction adds up to 69.4 hours of CPU time every day.
Storage is cheap – Storage is a service not a device.
Bandwidth is cheap – Dynamically generated pages tend to have a lot of junk in it. 1K of junk per page equates to 1GB of junk with 1 million page views a day.

Some interesting Capacity Anti-patterns:

AJAX Overkill – Don't return small HTML and then send 400 request back to your server to get the reset. Best to return the JSON to the browser on the first request.
The Reload Button – Fast sites don't provoke the user into hitting the Reload button.
Cookie Monsters – Don't store your database in a cookie.

To combat these here are some interesting Capacity patterns:

Use Caching Carefully – Limit your cache size and cache expensive objects.
Pre-compute Content – If generating the content is expensive, process it offline.

I recently tried to apply the wasted space remover pattern in one of our projects. I was able to remove just under 10k worth of white-space (though I did manage to introduce bugs which frustrated the team). However after discussing with the team compression seems to take care of most of it. I still think it is important for clients that don't support compression (I know there are less and less every-time). This article seems to have some great statistics around why you shouldn't bother.

By understanding the capacity anti-patterns and the capacity patterns described in the book one can understand how to fine tune the system. This is achieved by an ongoing process of monitoring.

Capacity is fundamentally a measure of how much revenue the system can generate during a given period of time.

General Design Issues

There are many great topics in this section, however the one that I found really interesting is availability.

Availability of a system is typically measured as a factor of its reliability - as reliability increases, so does availability. However, no system can guarantee 100.000% reliability; and as such, no system can assure 100.000% availability – Wikipedia

An interesting take on availability is discussed in one of stability anti-patterns called SLA inversion.

SLA inversion states that unless every one of your dependant systems is engineered for the same SLA you must provide, then the best you can possibly do is the SLA of the worst dependant system.

It gets even worse than that statement. If built naively the probability of a system failing is the joint probability of a failure in any component or service. This means that if your system has five external services that each have 99.9% availability then the best your system can do is 99.5% (a little unclear on the maths would be great if someone pointed me in the right direction)

Operations

This chapter begins with a great story around when it rains it pours and how they dealt with it. Really interesting. The topic that really meant something to me in this section was Transparency.

Experienced engineers on ships can tell you when something is about to go wrong by the sound of the giant diesel engines. Transparency refers to the qualities that allow operators, developers and business sponsors to gain understanding of the systems historical trends, present conditions, instantaneous state and future projections.

Designing for transparency is really important as adding transparency late in the development is about as effective as adding quality.

Some great ideas around transparency are as follows:

Monitoring and reporting systems should be built around your system, not in it. Better to expose than to couple to a service.
Make sure you discuss what triggers alerts.
Logging is still very important to this day and what you log is more important. A pet hate of mine is a system that tells you everything is OK in log files. Log file should only be used as a way to see what is going wrong with the system (what are your thoughts around this?).
It is important to also get the logging levels right. I personally think that in production nothing over WARNING should be allowed.
Understanding all the messages your system will produce is important. This is easier if you built the whole application. Message codes simplifies the communication between development and operations.
Remember that in the end all of your decisions need to be understood by humans. When stressful situations occur the last thing you want is to try to decipher what the system is trying to tell you.

Conclusion

This has been a fantastic read and I really recommend it to anyone that wants to really think about what happens to your system once it goes in the wild. A lot of the issues these days are being addressed by cloud providers. This books really shows some of the early work that the DevOps movement is trying to solve so I applaud it. If you want more information check out this blog.

Tuesday 24 July 2012

Introduction

“I’ve failed over and over and over again in my life and that is why I succeed.” - Michael Jordan

I have always wanted to start writing a blog about technical information that I find interesting and have been curious in writing having been inspired by many people. Hopefully these topics are of interest to you.

So a little about me. I have been in the IT industry for the last 12 years and I can say that I am obsessed with my line of work, for a more detailed look check out LinkedIn. Some topics that interest me are:

I am a fanatic at reading books. I love learning from people that are smarter than me. I thought I would take this forum to summarise what I have learnt and at the same time improve on my writing. I am very excited on the journey.