Monday, 23 January 2012

The Cost Of Following An SDL

When you're a company that is both a consultancy providing advice and code review services around software security and also an independent software developer; there is always the danger of not practising what you preach. The problem is compounded when resources aren't as abundant as in larger companies and time is ideally either spent doing paid consultancy or writing new features for products. As a result we think we're a good Petri dish, so we're happy to talk openly about our experience and incurred time costs of streamlining a Security Development Lifecycle during a recent development effort.

The story began when we set out to write our newest product (Recx GPS Image Forensics for Maltego). When starting a project, rigorously following a traditional SDL in the early stages of a project may consume too much resource to be economically viable. We decided to use an 'SDL diary' (new buzzword alert) in order to maintain security mindfulness and track how much development time security cost us. Utilising any shared resource (such as a wiki) allows input that can be easily be reviewed and recorded and can be used as the basis for future documentation.

Simply by keeping the SDL in mind within the early development/research cycle can guide a project in the right direction and ease the transition into a fully imposed SDL. What follows are notes from our SDL diary. Prior to product development starting we knew what we wanted to do and had settled on an open source component to build upon.

Security Requirements
Before starting the design or implementation our security requirements were:
    1. All binaries should be code signed and verifiable as coming from us.
    2. All new code should be written in a managed language to minimise the risk of certain classes of vulnerability.
    3. All mitigations against successful exploitation of arbitrary code vulnerabilities in native code via OS and compiler/linker features should be leveraged where needed.
    4. High risk native processes that work on externally supplied data should run with minimal privileges.
    5. High risk native processes should be blocked from speaking directly to the network
    6. Any single crash when processing an image should not halt processing of others.
    7. Input extracted from external data should not be trusted when generating output.
      You'll notice we didn't stipulate a requirement of identifying possible memory corruption vulnerabilities in native code. The reason? We knew from the start that we had to contend with 20MB of C/C++ source code from open source components (GPL people: we've included the source in the installation binary to be compliant with the license). The reason for not reviewing the code either manually or using automation? Our risk analysis said the likelihood of successful exploitation of any issues if we implemented the other requirements was low based on current understanding (the assumption), plus we would be reducing its privileges to the lowest possible and thus the risk was worth bearing versus the effort. We accept mitigations are not a replacement for secure coding practices but we felt they were sufficient given our deployment model.
      If at this point you're questioning if someone is really going to target forensics software, we encourage you to go and read the paper from iSec Partners from 2007. 

      Design / Functional Requirements
      For the first requirement meant obtaining code signing keys, which was easy enough to do. We decided to use C# to satisfy the second requirement.

      To satisfy the third requirement we wanted our code to be 'Recx SDL Windows Binary Auditor' clean (a soon to be released product which is currently in the final throws of legal) and to doubly sure Microsoft BinScope clean. To meet this requirement it meant instead of using the typical GCC produced binaries available from the open source projects we would recompile them all with Visual Studio 2010 to leverage all the available defensive compiler/linker and OS features.

      When we conceived the product, as we've previously mentioned, we decided to use an open source component for the image parsing which is a mass of C++ . We knew through our threat modelling exercise that this represented the biggest attack surface. We also recognized it was highly unlikely that we'd have the resources to run static or manual analysis. As a result we designed the solution so as to run this functionality in a self contained process that could run as Low Integrity under Windows. The idea being, if the process was successfully compromised, then the impact on the overall system would hopefully be minimized (if the OS does what it says on the tin) and satisfy our fourth requirement. For the fifth requirement we leverage the in-built firewall in Microsoft Windows Vista/7/Server 2008 to specifically block our image processor from speaking with the network by creating a rule during the installation process.

      To satisfy the sixth requirement we decided to spawn a new native process for the parsing of each image. This would also have the benefit of mitigating the ability for an attacker be able to use exploitation techniques that rely on multiple images being parsed consecutively within the same process.

      For the final requirement this meant stipulating a functional requirements that CDATA should be used for all image originating data that would be outputted in XML to Maltego.

      For the implementation phase we performed the following security specific tasks:
      • Ensured we adhered to our CDATA usage functional requirement when producing XML output.
      • Static analysis of all our C# code using FXCop [1] [2]: 
        • 218 messages with all rules enabled.
        • 0 messages with only the security and security transparency rules enabled.
      • Use of Visual Studio 2010 to recompile all of the native GPL code with the appropriate compiler and linker options.
      • Use of both strict_gs everywhere as the performance impact was negligible in our use case.
      • Implement running of the image parsing as a low integrity process.

      Now for a little digression (sort of). The standard C# API has, simply put, not kept up with Windows Vista / Windows 7 security features. So if you're using the C# process class there is no easy way to launch a process with a custom token and thus low integrity. Microsoft do provide an example of how to create a process to run as low integrity in .NET in KB2278183, where they use CreateProcessAsUser. However, this example doesn't support the redirection of stdin and stdout, and is frankly a tiny bit clunky compared to the standard Process class. While others have overcome a similar problem using CreateProcessAsUser in C# and redirecting stdin and stdout the solution was messy compared to the standard .NET classes. So we decided to not go with it as a solution. It's important to point out that Microsoft's private classes in System do come close to doing what we need, for example the private method:
      // System.Diagnostics.Process
      private bool StartWithCreateProcess(ProcessStartInfo startInfo)
      NativeMethods.CreateProcess(null, stringBuilder, null, null, true, num2, intPtr, text, sTARTUPINFO, pROCESS_INFORMATION);
      So in our humble opinion it shouldn't be a huge leap to provide a supported public method that allows launching low integrity processes in .NET with all the niceness of the existing Process class. So Santa, if you're listening I think you know what we're asking for. Anyway, the solution we used in the end? We went with using icacls.exe during the installation process to set the image processing executable to run as low integrity.

      For the verification phase we did a number of different things:
      • Ensured our XML output was using CDATA for image originating data so as to not provide a XML injection vector:
        • via manual inspection of code coupled with inspection of the XML output.
      • Ensuring the image processing component actually ran as low integrity:
      • Use of 'Recx SDL Windows Binary Auditor' to ensure the native binary came up clean.
      • Use of Microsoft BinScope to ensure the native binary came up clean.

      Using BinScope we had ~20 'failures' against the 'GSFunctionOptimizeCheck' with an error similar to:

      Click for Larger Image
      This is an interesting failure for several reasons:
      • The check isn't explained in the BinScope help.
      • Microsoft don't provide any indication on how to fix it.
      • It was present in large quantities in the debug in the build - as it disables optimization by default, so to be expected.
      • On the release build there was one instance in a function with only a single local variable, two input variables and not output variables. So we put this down to false positive.

      People will likely ask, why we didn't do fuzzing? Again, given our security requirements and design decisions we felt it would be like manually reviewing code or using static code analysis. Yes some value, but given the other mitigations likely a lot of work for little extra value based on our current risk analysis.

      Release / Response Planning
      Even doing what we've discussed above we know we needed to prepare for the worse case. As a result we followed to the standard of establishing a secure @ e-mail address to allow third parties to report any security issues should they find any.

      Sustainment / On Going Actions
      We recognize there are some not so obvious security debt associated with using open source (and some obvious ones such as the need a code review of all that C/C++ code when we're a massive success to repay some of the security debt we've incurred). Anyway, as they say, what you don't pay for upfront you end up paying for in the end.  In our case the non obvious security cost is the ongoing need to monitor three open source projects for new releases. We need to review these releases to check if they resolve obvious and not so obvious security issues. What does this mean in practice?
      • Monitoring the release pages for each project.
      • Reviewing change logs on new releases even if not accompanied with a security advisory to see if they resolve security defects which might not be obvious.
      All of which adds to the on-going sustainment cost.

      The Cost To-date
      The total cost of following a streamlined SDL for our product was ~14% of our time prior to the release. This can be a substantial cost to a product that has not yet become profitable and we are currently testing new ways to balance this overhead without incurring too much 'security debt'.

      Foot Notes
      Something we would have liked to have done but is slated for a future release:
      • Distribute EMET and a configuration for our native process to provide mitigations to XP users


      1. Thanks for the article. Good reading. I'm keen to know how you came to the 14% figure - does this factor in time saved resolving security issues which due to your use of SDL have not arisen? Or is it purely a case of you spent 14% longer on the project? In which case do you feel it was worthwhile - what is the upside? And ultimately, having presumably not developed this previously using SDL, how do you know it was 14%? I'm keen to follow a similae exercise but don't know where to start in terms of measuring the benefit/loss.


        1. The 14% percent figure was the % of total effort spent on the project specifically doing the work associated with the security mindfulness activities. There is nothing else factored into this figure such as potential future savings.

          Without developing the same product in isolation by a different team we can't gauge how much we've saved compared to if we did nothing as it's highly dependant on a number of variables. For example if the product is a flop and no ones uses it then the likelihood of security issues being discovered that we have to resolve goes down, so there was nothing saved and pure cost. Also if the other team was also security aware and just did the right thing without thinking about it / breaking it out then the cost becomes blended. It's only because we specifically kept track we could point to total project time spent doing security related activities.

          Was it worthwhile? It was the right think to do. But we'll see how the product sells before we answer if it was worthwhile..