Building Real Software: 2009

Wednesday, December 23, 2009

New Communities for Project Managers and Security

The other day I happened across a new Q&A community forum for project managers called AskAboutProjects.com. This site is built using the Stack Overflow Knowledge Exchange Engine, the same platform that is used to host the popular software development Q&A site Stack Overflow and Server Fault, a similar resource for IT system administrators.

The Stack Overflow engine is an effective and low cost platform for quickly building communities. It has some quirks, many of them around its security model, which make it awkward to use at times: for example it is difficult to enter links in answers or profiles (sometimes it works well, sometimes it doesn’t). I haven’t taken the time to figure out why, and I shouldn’t need to: the UI should be more seamless. And Firefox’s NoScript plug-in (I never leave home without it) occasionally catches XSS problems on some of the sites.

But the community experience is addictive – I find myself spending way too much time scanning the boards and offering help where I can. Some of the programmers on my team have found StackOverflow handy when working with new technology or debugging obscure technical problems.

There is something of a gold rush going on, with people hurrying to setup new communities using this engine: there are communities being launched for gamers, amateur radio, technology support forums, sports betters, dating, industrial robots, the iPhone, travel, diving, professional stock traders, musicians, real estate, organizational psychology, aerospace engineering, startups, world cup soccer, natural living, electronics, climate change, mountain biking, money, moms, spirituality…. you name it.

Another one of these sites that I am following is SecurityCrunch, a new community focused on IT security issues.

Of course there is no guarantee which communities will catch on. AskAboutProjects is new and the community is still small. Many of the forum questions so far are either homework assignments (which plague Stack Overflow as well) and seed questions from the founders of the community. Although it appears to be intended as a general resource for project managers, it is clearly focused at the moment on IT, and more specifically software development, projects and related issues, reflecting the founders’ backgrounds.

It will be worth keeping an eye on these communities over the next few months, to see which, if any of them, can replicate the success of Stack Overflow.

Friday, December 11, 2009

Much ado about... nothing much (Agile Vancouver 2009)

In early November I attended Much Ado About Agile, the Agile Vancouver interest group’s annual conference. I was looking for a short break, and this conference offered a chance to get away from daily responsibilities, reflect, and learn more about the state of the art in software development.

I’ve decided to look back to see what stayed with me, what I learned that was worth taking forward.

First off, it was grand being back in Vancouver – I lived in Vancouver for a couple of years and always enjoy going back, the mountains and the water, the parks and markets and the sea shore, dining at some of the city’s excellent restaurants, and of course snacking at wonderful, quirky Japadog.

The conference agenda was a mixed bag: a handful of Agile community rock stars re-playing old hits or pushing their latest books, including Martin Fowler, Johanna Rothman, and Mary Poppendieck; some consultants from ThoughtWorks and wherever else presenting commercials in the guise of case studies; and some earnest hands-on real developers telling war stories, from which you could hope to learn something.

I was surprised by the number of (mostly young), well-intentioned enthusiastic people at the sessions. There was sincere interest in the rooms; you could feel the intensity, the heat from so many questing minds. We were looking for answers, for insight, for research and experience.

But what we got wasn’t much unfortunately.

The rock stars were polished and confident, but mostly kept to safe, introductory stuff. I remember attending Martin Fowler’s keynote. Martin is indisputably a smart guy and worth listening to: I had the pleasure of spending a few days with him on round tables at last year’s Construx Software Executive Summit where we explored some interesting problems in software development. To be honest, I had to go back to my notes to remember what he spoke about in Vancouver: a couple of short talks, one on agile fundamentals and something smart about technical debt and simple design. If you’ve read Martin’s books and follow his posts, there was nothing new here, nothing to take back. Maybe I expected too much.

I decided to avoid the professional entertainment for a while and see what I could learn from some less polished, real-life practitioners. I stuck to the “hard” track, avoiding the soft presentations on team work, building trust and such.

A talk on “Agile vs the Iron Triangle” about using lightweight methods to deliver large projects delivered on a fixed cost, fixed schedule basis. How to make commitments, freezing the schedule and then managing scope – following incremental, build-to-schedule methods. Most of the challenges here of course are in estimating the size of work that needs to be done, understanding the team’s capacity to deliver the work, and making trade-offs with the customer: accepting but managing change, trading changes in scope in order to adhere to the schedule. This lecture was interesting because it was real, representing the efforts of an organization trying to reconcile plan-driven and agile practices, working with customers with real demands, under real constraints.

Another session was on operations at a small Internet startup where the development team was also responsible for operations. The focus here was on lightweight, open source operations tooling: essential tools for availability checks, log monitoring, performance and capacity analysis, system configuration using technology like Puppet. Nothing new here, but it was fun to see a developer so excited and focused on technical operations issues, and committed to keep the developers and operations staff working closely together as the company continued to grow.

Some more talks about the basics of performance tuning, an advertisement for ThoughtWorks Cruise continuous integration platform, and some other sessions that weren’t worth remembering. I had the most fun at Philippe Kruchten’s lecture on backlog management: recognizing and managing not only work for business features, but architecture / plumbing, and technical debt, “making invisible work visible”. Dr. Kruchten is an entertaining speaker, he clearly enjoys performing in front of a crowd, and he enjoys his work, his enthusiasm was infectious.

And finally a technical session by Michael Feathers on Error Processing and Error Handling as First Class Considerations in Design, who bucked the trend, playing the cool professor who could not care less if half the class was left behind. His focus was on idioms and patterns for improving error handling in code, in particular, the idea of creating “safe zones”, where you only need to worry about construction problems if you are at, or outside the edge of the zone, making for cleaner and more robust code in the safe core. Definitely the hardest, geekiest of the talks that I attended. And like several of the sessions I attended, it had little to do directly with agile development methods – instead it challenged the audience to think about ways to write good code, which is what it all comes down to in the end.

Michael Feathers aside, most of the speakers underestimated their audiences – at least I hope that they did – and spoke down, spoon feeding the newbies in the audience. It made for dull stuff much of the time – as earnest, or entertaining as the speaker might be, there wasn’t much to chew on. There could have been much more to learn with so many smart people there, and I wasn’t the only one looking for more meat, less bun. The conference wasn’t expensive, it was well managed, but it didn’t offer an effective forum to dig deep, to find new ways to build better software, or software better. For me, at least, there wasn't much ado.

Tuesday, December 8, 2009

Reliability and the Risks of Using Enterprise Middleware

If you are building systems with high requirements for performance and reliability, it is important that you are careful, selective of course, but even more important, sparing in your use of general-purpose middleware solutions to solve your technical problems.

There are strong, obvious arguments in favor of using proven middleware solutions, whether commercial off the shelf software (COTS) or open source solutions - arguments that are based on time-to-market, risk mitigation, and cost leveraging:

Time-to-market
In most cases, it will take much less time to evaluate, acquire, install, configure and understand a commercial product or open source solution than to build your own plumbing. This is especially important early in the project when your focus should be on understanding and solving important business problems, delivering value early, getting something working in the customer’s hands as soon as possible for feedback and validation.

Risk mitigation
Somebody has already gone down this path, taken the time to understand a complex technical problem space, made some mistakes and learned from them. The results are in front of you. You can take advantage of what they have already learned, and focus on solving your customer’s business problems, rather than risking falling into a technical black hole.

Of course you take on a different set of risks: that the solution is of high quality, that you will get adequate support (from the vendor or the community), that you not are buying into a dead end.

Cost leverage
For open source solutions, the cost argument is obvious: you can take advantage of the time and knowledge invested by the community for close to nothing.

In the case of enterprise middleware, companies like Oracle and IBM have spent an awful lot of money hiring smart people, or buying companies that were created by smart people, invested millions of dollars into R&D and millions more into their support infrastructures. You get to take advantage of all of this through comparatively modest license and support fees.

The do-it-yourself, not-invented-here arguments for building instead of buying are essentially that your company is so different, your needs are unique: that most of the money and time invested by Oracle and IBM, or the code built up by an open source community, does not apply to your situation, that you need something that nobody else has anticipated, nobody else has built.

I can safely say that this is almost always bullshit: naïve arguments put forward by people who might be smart, but are too intellectually lazy or inexperienced to properly understand and frame the problem, to bother to look at the choice of solutions available, to appreciate the risks and costs involved in taking a proprietary path. But, when you are pushing the limits in performance and reliability, it may actually be true.

A fascinating study on software complexity by NASA’s Office of the Chief Engineer Technical Excellence Program examines a number of factors that contribute to complexity and risk in high reliability / safety critical software systems (in this case flight systems), and success factors in delivery of these systems. One of the factors that NASA examined was the risks and benefits of using commercial off the shelf software (COTS) solutions:

Finding:

Commercial off-the-shelf (COTS) software can provide valuable and well-tested functionality, but sometimes comes bundled with additional features that are not needed and cannot easily be separated. Since the unneeded features might interact with the needed features, they must be tested too, creating extra work.

Also, COTS software sometimes embodies assumptions about the operating environment that don’t apply well to [specific] applications. If the assumptions are not apparent or well documented, they will take time to discover. This creates extra work in testing; in some cases, a lot of extra work.

Recommendation:

Make-versus-buy decisions about COTS software should include an analysis of the COTS software to: (a) determine how well the desired components or features can be separated from everything else, and (b) quantify the effect on testing complexity. In that way, projects will have a better basis for make/buy and fewer surprises.

The costs and risks involved with using off the shelf solutions can be much greater than this, especially when working with enterprise middleware solutions. Enterprise solutions offer considerable promise: power and scale, configuration to handle different environments, extensive management capabilities, interface plug-and-play… all backed up by deep support capabilities. But you must factor in the costs and complexities of properly setting up and working with these products, and the costs and complexities in understanding the software and its limits: how much time and money must you invest in a technology before you know if it is a good fit, if it fulfills its promise?

Let’s use the example of an enterprise middleware database management system, Oracle’s Real Application Cluster (RAC) maximum availability database cluster solution.

Disclaimer: I am not an Oracle DBA, I am not going to argue fine technical details here. I chose RAC because of recent and extensive experience working with this product, because it is representative of the problems that teams can have working with enterprise middleware. I could have chosen other technologies from other projects, say Weblogic Suite or Websphere Application Server and so on, but I didn’t.

The promise of RAC is to solve many of the problems of managing data and ensuring reliability in high-volume, high-availability distributed systems. RAC shares and manages data across multiple servers, masks failures and provides instant failover in an active-active cluster, and allows you to scale the system horizontally, adding more servers to the cluster as needed to handle increasing demands. RAC is a powerful data management solution, involving many software layers, including clustering and storage management and data management and operations management, designed to solve a set of complex problems.

In particular, one of these technical problems is maintaining cache fusion across the cluster: fusing the in-memory data on each server together into a global, cluster-wide cache so that each server node in the cluster can access information locally as it changes on any other node.

As you would expect, there are limits to the speed and scaling of cluster-wide cache fusion, especially at high transaction rates. And this power and complexity comes with costs. You need to invest both in infrastructure, in a highly reliable and performant network interconnect fabric and shared storage subsystem, and in making fundamental application changes, to carefully and consistently partition data within the database and carefully design your indexes in order to minimize the overhead costs of maintaining global cache state consistency. As the number of server nodes in the cluster increases (for scaling purposes or for higher availability), the overhead costs and the costs involved in managing this overhead increase.

RAC is difficult to setup, tune and manage in production conditions: this is to be expected – the software does a lot for you. But it is especially difficult to setup, tune and manage effectively in high-volume environments with low tolerance for variability and latency, where predictable performance under sustained load, and predictable behavior in failure situations, is required. It requires a significant investment in time to understand the trade-offs in setup and operations of RAC, to balance reliability and integrity factors against performance; choosing between automated and manual management options, testing and measuring system behavior, setting up and testing failover scenarios, carefully managing and monitoring system operations. To do all of this will require you to invest in setting up and maintaining test and certification labs, in training for your operations staff and DBAs, in expert consulting and additional support from Oracle.

To effectively work with enterprise technology like this, at or close to the limits of its design capabilities, you need to understand it in depth: this understanding comes from months of testing and tuning your system, working through support issues and fixing problems in the software, modifying your application and re-testing. The result is like a race car engine: highly optimized and efficient, running hot and fast, highly sensitive to change. Upgrades to your application or to the Oracle software must be reviewed carefully and extensively tested, including planning and testing rollback scenarios: you must be prepared to manage the very real risk that a software upgrade can affect the behavior of the database engine or cluster manager or operations manager or other layers, impacting the reliability or performance of the system.

Clearly one of the major risks of working with enterprise software is that it is difficult, if not impossible, to learn enough about the costs and limits of this technology early enough in the project – especially if you are pushing these limits. Hiring experienced specialists, bringing in expert consultants, investing in training, testing in the lab: all of this might not be enough. While you can get up and running much faster and cheaper than you would trying to solve so many technical problems yourself from the start, you face the risk that you may not understand the technology well enough, the design points and real limits, how to make the necessary balances and trade-offs – and whether these trade-offs will be acceptable to you or your customers. The danger is that you become over-invested in the solution, that you run out of time or resources to explore alternatives, that you give yourself no choice.

You are making a big bet when working with enterprise products. The alternative is to avoid making big bets, avoid having to find big solutions to big problems. Break your problems down, and find narrow, specific answers to these smaller, well-bounded problems. Look for lightweight, single-purpose solutions, and design the simplest possible solution to the problem if you have to build it yourself. Spread the risks out, attack your problems iteratively and incrementally.

In order to do this you need to understand the problem well – but whether you break the problem down or try to solve it with an enterprise product, you can’t avoid the need to understand the problem. Look (carefully) at the options available, at open source and commercial products, look for the smallest, simplest approach that fits. Don’t over-specify, or design, yourself into a corner. Don’t force yourself to over-commit. And think twice, or three or four times, before looking at an enterprise solution as the answer.

Sunday, November 22, 2009

Small Teams, Big Results

How big a team do you need to deliver big results?

When my partners and I created this startup a few years ago, we made the decision to staff the development team with people that we knew, strong developers and test engineers who we had worked with before, people who we trusted and who trusted us. There were a lot of unknowns in what we were trying to achieve, could we secure enough funding, would our business model succeed, did we choose the right technologies, did we have the right design, could we handle all of the details and problems in launching a new market, could we deal with all of the integration and service needs, the regulatory and compliance requirements, …. and all of the other challenges that startups with aggressive goals face. So we wanted to take the risks out of development as much as possible, to assure stakeholders that we could deliver what we needed to, when we needed it.

We were lucky to put together a strong team of senior people: many of them we had worked with for several years, and some of them on only one project. But they were known quantities: we knew what to expect, and so did they. They understood the problem domain well and came up to speed on our design, and they came together as a team quickly of course, re-establishing relationships and norms – so we could hit the ground running. And we’re even more fortunate in that the core of the team has stayed in place from the start, and that we have been able to carefully add a few more senior people to the team, so that we continue leverage the experience and knowledge that everyone has built up.

There are tremendous advantages in working with a small group of experienced people who know what they are doing and care about doing a good job, people who enjoy challenges and who work well together.

In 10x Software Engineering, Construx Software examines the key factors that contribute to exceptional performance in software development: the factors and good engineering practices that drive some individuals and teams to outperform others by up to 10x. Some of these key success factors are keeping teams small, keeping teams together, and leveraging experience: that small teams of senior people, with a strong sense of identity and high levels of trust, staying together through projects, can significantly outperform the norm.

The value of small, experience-heavy teams, and especially of senior people who are deeply committed to doing a good job, committed to their craft, is explored in Pete McBreen’s excellent book Software Craftsmanship: the New Imperative. Pete shows that developers who have worked together in the past are more productive than teams created from scratch: that it is an important success factor for teams to be able to choose who to work with, to choose people who they know they can depend on, and who they feel comfortable working with. He especially emphasizes the importance of experience: that a jelled team of experienced people, working in an open and trusting way, can amplify each other’s strengths and work at hyper-productive levels, and that in a hyper-productive team of experienced developers who are playing at the top of their game, there is little space for beginners or “warm bodies”.

Pete also looks at the issue of team size in Questioning Extreme Programming, a skeptical but balanced review of XP practices which deserves more attention. Pete suggested at the time that

"XP is best suited to projects with a narrow range of team sizes, probably 4 to 12 programmers. Outside that range, other processes are probably more appropriate. The good news, however, is that a great many projects fall into the range of applicability. Indeed, there is some evidence that XP-size projects are predominant in the industry.”

Although my focus here is not on XP practices, the idea that most problems that the industry faces can be managed by small teams, following lightweight but disciplined practices, is an important one.

Back in March 2008, Steve McConnell asked a question on the Construx Forum about how to scale up a development team quickly. My answer would be to keep the core team as small as possible, add people that other people have worked with before and know and trust, and add fewer, more senior, experienced, technically strong staff.

I have worked a lot with large technology companies like IBM and HP, and I was surprised to find out how small the core engineering teams are in those big companies. Where a company like IBM might have a big distributed first-level and second-level support organization, trainers, offshore testing labs, product managers, marketing teams, technical writers, pre-sales support engineers, sales teams, vertical industry specialists, integration specialists, project managers and other consultants: all of these people leverage the IP created by small, senior teams of engineers and researchers. These engineering teams, even at a company like IBM, have a different culture than the customer-facing parts of the organization – less formal and more inward-focused on technical excellence, on “alpha geekdom” – and were more free to come up with new ideas. Google, of course, is an extreme example of this: a large company where lots of software is created by very small, very senior teams driven to technical excellence.

It makes sense to follow the same model in scaling out a team: start with a small, senior core, be careful to add a few senior people, space out your hires, and scale out primarily in supporting roles, allowing the small core engineering team to continue to focus on results, on excellence.

One of the many advantages of small teams is that they spend less time and make fewer mistakes communicating with each other, you can use less formal (and less expensive) and more efficient communication methods. This lets you move faster, and adapt faster to change. If the team is made up of mostly of experienced, senior staff, they can get maximum value out of lightweight “small a” agile methods, take on less unintentional technical debt, again reducing cost and time, and by making fewer mistakes and writing better code in the first place, create a higher quality product, further accelerating results in a virtuous circle.

The key here is to have enough discipline, to follow enough good engineering practices, without weighing the team down too much.

In Nailing the Nominals, Eric Brechner, in charge of engineering excellence at Microsoft, sets the limit at 100,000 lines of code and 15 people. Below this line,

"you can…use emergent design, have a loose upfront design bar, rewrite and refactor the code endlessly while the customer looks over your shoulder. When your code base and your project is bigger, it's solid design and disciplined execution or it's broken code, broken teams, and broken schedules."

In another related post, Green Fields are Full of Maggots, I.M. Wright, er, I mean Eric, goes on to say that

"the regular refactoring and rework needed for emergent design isn't a problem for small code bases. However, when you get above 100,000 lines of code serious rework becomes extraordinarily expensive. That's why customer-focused, up-front architecture is essential for large projects.

This was researched by Dr. Barry Boehm, et al, in May 2004. They found that rework costs more than up-front design on projects larger than 100 KLOC. The larger the project, the more up-front design saves you."

What’s of particular interest to me is that we work right on the edge of these limits, in terms of size of code base (although we also have a lot of test code and other supporting code) and total team size. Our extra focus on discipline and controls is necessary because of the high standards of reliability that we need to stand up to, and the complexity of the problems that we solve. While we could move even faster, the risk and cost of making mistakes is too high. So the challenge is to achieve the right balance, between speed and efficiency and discipline.

Saturday, October 17, 2009

The real cost of software security

There has been a lot of discussion in the blogosphere over the last few months on costs and ROI justifications for building secure software. Back in July, I responded to a post by Jeremiah Grossman, CTO at White Hat Software, which examined the end-to-end costs of software security, whether and how upfront investments in a secure SDLC mitigate downstream security costs and risks: a classic “pay me now or pay me (much more) later” problem. In my response to Jeremiah’s analysis, I tried to break out the costs of building secure software: what I now think of as direct, “hard” pure security costs, compared to indirect, "soft" supporting costs, the costs of building software properly in the first place. From a budgeting and ROI perspective, it is important to break out the costs of building software correctly in the first place, the foundational practices, from real security costs.

An effective software security program has to rely on a foundation of software quality management practices and governance. Your quality management program can be lightweight and agile, but if you are hacking out code without care for planning, risk management, design, code reviews, testing, incident management then what you are trying to build is not going to be secure. Period. Coming from a software development background, I feel that the security community is trying to take on too much of the responsibility, and too much of the costs, for ensuring that basic good practices are in place.

But it’s more than that. If you are doing risk management, design reviews, code reviews, testing, change and release management, then adding security requirements, perspectives, controls to these practices can be done simply, incrementally, and at a modest cost:

Risk management: add security risk scenarios, threat assessment to your technical risk management program.
Design reviews: add threat modeling where it applies, just as you should add failure mode analysis for reliability.
Code reviews: first you can argue, as I did earlier in my response, that a significant number of security coding problems are basic quality problems, simply a matter of writing crappy code: poor or missing input validation (a reliability problem, not just a security problem), lousy error handling (same), race conditions and deadlocks (same), and so on. If making these kinds of mistakes a security problem (and a security cost) helps get programmers to take them seriously, then go ahead, but it shouldn’t be necessary. So what you are left with are the costs of dealing with “hard” security coding problems, like improper use of crypto, secure APIs, authentication and session management, and so on.
Static analysis: as I argued in an earlier post on the value of static analysis, static analysis checks should be part of your build anyways. If the tools that you use, like Coverity or Klocwork, check for both quality and security mistakes, then your incremental costs are in properly configuring the tools and understanding, and dealing with, the results for security defects.
Testing: security requirements need to be tested along with other requirements, on a risk basis. Regression tests, boundary and edge tests, stress tests and soak tests, negative destructive tests (thinking like an attacker), even fuzzing should be part of your testing practices for building robust and reliable software anyways.
Change management and release management: ensure that someone responsible for security has a voice on your change advisory board. Add secure deployment checks and secure operations instructions to your release and configuration management controls.
Incident management: security vulnerabilities should be tracked and managed as you would other defects. Handle security incidents as “level 1” issues in your incident management, escalation, and problem management services.

So what are the direct costs of a software security program? Looking over my own budget, these are the major cost items that I can find:

Training managers, architects, developers, testers on security concepts, issues, threats, practices. You need to invest in training upfront, and then refresh the team: Microsoft’s SDL requires that technical staff be retrained annually, to make sure that the team is aware of changes to the threat landscape, to attack bad habits, to reinforce good ones.
As I described in an earlier post on our software security roadmap , we hired expert consultants to conduct upfront secure architecture and code reviews, to help define secure coding practices, and to work with our architects and development manager to plan out a roadmap for software security. Paying for a couple of consulting engagements was worthwhile and necessary to kickstart our software security program and to get senior technical staff and management engaged.
Buying “black box” vulnerability scanning tools like IBM Rational Appscan and Nessus, and the costs of understanding and using these tools to check for common vulnerabilities in the application and the infrastructure.
Penetration testing: hiring experts to conduct penetration tests of the application a few times per year, to check that we haven’t got sloppy and missed something in our internal checks, tests and reviews, and to learn more about new attacks.
Some extra management and administrative oversight of the security program, and of suppliers and partners.

The other incremental costs of building secure software, like the costs for building robust and reliable software, are now effectively burned in to our SDLC, into how we plan, design, build, test, deploy and support software. I could break out the incremental cost burden of these security practices and controls, but the costs would be modest - most of the cost, and the work, is in building software properly. And by following an incremental, optimizing approach, starting small and continuously reviewing and improving, not only are upfront costs for a software security program reduced, but the ROI is realized much faster. If you set your quality bar high enough, the real costs of secure software are surprisingly low.

Saturday, October 10, 2009

Dreaming in Code - A Failure in Leadership

Reading Scott Rosenberg’s Dreaming in Code gives you a sick feeling, the same sick feeling that you have watching a movie where the hero’s life is coming unraveled, or when you are involved in a project that is going nowhere fast, facing certain failure, and there is nothing that you can do to change the outcome. I made myself read it twice – there are some hard lessons for managing software development in this book.

Dreaming in Code tells the story of the failed Chandler project started by Mitch Kapor, software industry visionary and founder of Lotus Development Corp, currently Chairman of the Mozilla Foundation. Chandler began as an ambitious, “change the world” project to design and build a radically different kind of personal information manager, an Outlook-and-Exchange-killer that would flatten silos between types of data and offer people new ways of thinking about information; provide programmers a cross-platform, extensible open source platform for new development; and create new ways to share data safely, securely, cheaply and democratically, in a peer-to-peer model.

The project started in the spring of 2001. Because of Mitch Kapor’s reputation and the project’s ambitions, he was able to assemble an impressive team of alpha geeks. Smart people, led by a business visionary who had experienced success, interesting problems to solve, lots of money, lots of time to play with new technology and chart a new course: a “dream project”.

But Dreaming in Code is a story of wasted opportunities. Scott Rosenberg, the book’s author, followed the project for 3 years starting in 2003, but he gave up before the team had built something that worked – he had a book to publish, whether the story was finished or not. By January 2008, Mitch Kapor had called it quits and left the company that he founded. Finally, in August 2008, the surviving team released version 1.0, scaled back to a shadow of its original goals and of course no longer relevant to the market.

It is interesting, but sad, to map this project against Construx’s Classic Mistakes list, to see the mistakes that were made:

Unclear project vision. Lack of project sponsorship. Requirements gold-plating. Feature creep. Research-oriented development. Developer gold-plating. Silver bullet syndrome. Insufficient planning. Adding people to a late project. Overly optimistic schedules. Wishful thinking. Unrealistic expectations. Insufficient risk management. Wasted time in the “fuzzy front end”: the team spent years wondering and wandering around, playing with technology, building tools, exploring candidate designs, trying to figure out what the requirements were - and never understood the priorities. Shortchanged quality assurance… hold on, quality was important to this project. It wasn’t going to play out like other Silicon Valley startups. Then why did they wait almost 3 years before hiring a test team (of one person), while they faced continual problems from the start with unstable releases, unacceptable performance, developers spending days or weeks bug hunting.

The book begins with a meeting in July 2003, when the team should be well into design and development, where the project manager announces that the team is doomed, that they cannot hope to deliver to their commitments – and not for the first time. This is met with…. well, nothing. Nobody, not even Mitch Kapor, takes action, makes any decisions. It doesn't get any better from there.

This project was going to be different, it was going to be done on a “design-first” basis. But months, even years into the project, the team is still struggling to come up with a useful design. The team makes one attempt at time-boxed development, a single iteration, and then gives up. Senior team members leave because nothing is getting done. Volunteers stop showing up. People in the community stop downloading the software because it doesn’t work, or it does less than earlier versions – the team is not just standing still, they are moving backwards.

The project manager, after only a few months, quits. Mitch Kapor takes over. A few months later, the “fun draining out of his job”, he asks the team to redesign his job, and come up with a new management structure. OK, to be fair, he is rich and doesn’t have to work hard on this, but why start it all in the first place, why put everyone involved, even those of us who are going to read the book, through all of this?

The new management team works through a real planning exercise for this first time, making real trade-offs, real decisions based on data. It doesn’t help. They fail to deliver – again. And again with an alpha release. They then spend another 21 months putting together a beta release, significantly cutting back on features. By the time they deliver 1.0 nobody cares.

It’s a sad story. It’s also boring at times – the team spends hours, days in fruitless, philosophical bull sessions; in meaningless, abstract, circular arguments; asking the same questions, confronting the same problems, facing (or avoiding) the same decisions again and again. As the book says, “it’s Groundhog Day, again”. I hope that the writer may have not fully captured what happened in the design sessions and planning meetings – that, like the Survivor tv show, we only see what the camera wants us too, or happens to, an incomplete picture. That these people were actually more focused, more serious, more responsible than they come across.

The project failed on so many levels. A failure to set understandable, achievable goals. A failure to understand, or even attempt to articulate requirements. A failure to take advantage of talent. A failure to engage, to establish and maintain momentum. A failure to manage risks. A failure to learn from mistakes - their own, or others.

Most fundamentally, it was a failure of leadership. Mitch Kapor failed to hold himself and the team accountable. He failed to provide the team with meaningful direction, failed to understand and explain what was important. He failed to create an organization where people understood what it took to deliver something real and useful, where people cared about results, about the people who would, hopefully, someday, use what they were supposed to be building. And he gave up: he gave up well before 2008 when he left the company; he gave up almost from the start, he gave up when the hard, real work of building out his vision had actually begun.

Wednesday, October 7, 2009

A Joel Test for Software Security

Back in 2000, Joel Spolsky, software developer, entrepreneur, founder of StackOverflow and popular blogger on the business of building software, proposed a “highly irresponsible, sloppy test to rate the quality of a software team”, known as The Joel Test.

The Joel Test is a crude but effective tool for checking the maturity of a software development team, using simple, concrete questions to determine whether a team is following core best practices. The test could use a little sprucing up, to reflect improvements in the state of the practice over the last 10 years, to take into account some of the better ideas introduced with XP and Scrum. For example, “Do you make daily builds?” (question 3) should be updated to ask whether the team is following Continuous Integration. And you can argue that “Do you do hallway usability testing” (question 12) should be replaced with a question that asks whether the team works closely and collaboratively with the customer (or customer proxy) on requirements, product planning and prioritization, usability, and acceptance. And one of the questions should ask whether the team conducts technical (design and code) reviews (or pair programming).

A number of other people have considered how to improve and update the Joel Test. But all in all, the Joel Test has proved useful and has stood the test of time. It is simple, easy to remember, easy to understand and apply, it is directly relevant to programmers and test engineers (the people who actually do the work), it is provocative and it is fun. It makes you think about how software should be built, and how you measure up.

How does the Joel Test work?

It consists of 12 concrete yes/no questions that tell a lot about how the team works, how it builds software, how disciplined it is. A yes-score of 12 is perfect (of course), 11 is tolerable, a score of 10 or less indicates serious weaknesses. The test can be used by developers, or by managers, to rate their own organization; by developers who are trying to decide whether to take a job at a company (ask the prospective employer the questions, to see how challenging or frustrating your job will be); or for due diligence as a quick “smoke test”.

A recent post by Thomas Ptacek of Matsano Security explores how to apply the Joel Test to network security and IT management. In the same spirit, I am proposing a “Joel Test” for software security: a simple, concrete, informal way to assess a team’s ability to build secure software. This is a thought experiment, a fun way of thinking about software security and what it takes to build secure software, following the example of the Joel Test, its principles and its arbitrary 12-question framework. It is not, of course, an alternative to comprehensive maturity frameworks like SAMM or BSIMM, which I used as references in preparing this post, but I think a simple test like this can still provide useful information.

So, here is my attempt at the 12 questions that should be asked in a Software Security “Joel Test”:

1. Do you have clear and current security policies in place so developers know what they should be doing, and what they should not be doing? Realistic, concrete expectations, not legalese or boiler plate. Guidelines that programmers can follow and do follow in building secure software.

2. Do you have someone (person or a team) who is clearly responsible for software security? Someone who helps review design and code from a security perspective, who can coach and mentor developers and test engineers, provide guidelines and oversight, make risk-based decisions regarding security issues. If everybody is accountable for security, then nobody is accountable for security. You need to have someone who acts as coach and cop, who has knowledge and authority.

3. Do you conduct threat modeling, as part of, or in addition to, your design reviews? This could be lightweight or formal, but some kind of structured security reviews need to be done especially for new interfaces, major changes.

4. Do your code reviews include checks for security and safety issues? If you have to ask, “ummm, what code reviews?”, then you have a lot of work ahead of you.

5. Do you use static analysis checking for security (as well as general quality) problems as part of your build?

6. Do you perform risk-based security testing? Does this include destructive testing, regular penetration testing by expert pen testers, and fuzz testing?

7. Have you had an expert, security-focused review of your product’s architecture and design? To ensure that you have a secure baseline, or to catch fundamental flaws in your design that need to be corrected.

8. Do product requirements include security issues and needs? Are you, and your customers, thinking about security needs up front?

9. Does the team get regular training in secure development and defensive coding? Microsoft’s SDL recommends that team members get training in secure design, development and testing at least once per year to reinforce good practices and to stay current with changes in the threat landscape.

10. Does your team have an incident response capability for handling security incidents? Are you prepared to deal with security incidents, do you know how to escalate, contain and recover from security breaches or respond to security problems found outside of development, communicate with customers and partners.

11. Do you track security issues and risks in your bug database / product backlog for tracking and followup? Are security issues made visible to team members for remediation?

12. Do you provide secure configuration and deployment and/or secure operations guidelines for your operations team or customers?

These are the 12 basic, important questions that come to my mind. It would be interesting to see alternative lists, to find out what I may have missed or misunderstood.

Sunday, September 27, 2009

Risk Management - You Don't Have to Waltz with Bears

I recently finished reading Waltzing with Bears: Managing Risk on Software Projects by Tom DeMarco and Tim Lister, both recognized experts in software development and risk management. The material in this book covers much of the same territory as courses that I attended several years ago in software project management and risk management presented by these authors through The Atlantic Systems Guild.

This work is based on their experience as consultants and as expert witnesses in contract disputes and litigation over software project failures. The authors make a strong case that effective risk management is essential to the success of any software project worth doing; that you must be prepared to face failure and deal with uncertainty; that you must actively manage risks, by containing risks through schedule and budget buffers, or proactively mitigating risks by taking steps to reduce the probability and/or impact of a problem; that you must consider alternatives for any critical activities or work items; and that managing for success, attempting to evade risks, is the path to failure.

Waltzing with Bears focuses on the kinds of problems faced by (and effectively created by) large waterfall projects: trying to commit to scope and schedule and cost up front when there isn’t enough information to do so; and trying to account for and manage the unknown and unaccountable. It’s an almost hopeless situation, but the authors provide ideas and disciplines and tools that at least offer a better chance at success.

What’s necessary is to change the rules of the game, to consider other ways of building software.

In an earlier post, I explored how risk management can and should be burned into the way that you develop software; how schedule and scope and quality risks and other risks can be managed through the development lifecycle you choose and the engineering practices that your team follows. Johanna Rothman, in a paper titled “What Lifecycle: Selecting the Right Model for your Project” explores some of the same ideas, how to manage schedule risks and other risks through lifecycle models, in particular incremental and iterative development approaches.

It is clear to me that following incremental, iterative, timeboxed development, as in Extreme Programming and Scrum, will effectively mitigate many of the common risks and issues that concern the authors of Waltzing with Bears. To some extent, the authors agree, when they conclude that

“The best bang-per-buck risk mitigation strategy we know is incremental delivery” (by which they mean Staged Delivery), “development of a full or nearly full design, and then the implementation of that design in subsets, where each successive subset incorporates the ones that preceded it”.

While Scrum (interestingly) does not explicitly address risk management; it does mitigate scope, schedule, quality, customer and personnel risks through its driving principles and management practices: incremental timeboxed delivery (sprints), close collaboration within a self-managing team and with the customer, managing the backlog (scope) together with the customer (Product Owner), and daily standup meetings and retrospectives which allow the team to continuously adjust to issues and changes.

Extreme Programming (XP) recognizes and confronts risk directly and fundamentally – chapter 1 of Extreme Programming Explained: Embrace Change begins:

“The basic problem of software development is risk”.

In The Case for XP Chris Morris explains that XP is resilient to risk, that it inherently accepts change and uncertainty, rather than attempting to anticipate risk, to predict and manage dangers up front; building on a risk management model developed by political scientist Aaron Wildavsky.

XP addresses risk through:

- short iterations with fine-grained feedback

- keeping the customer close

- test-driven development to maintain a quality baseline

- refactoring and pair programming to ensure code quality

- continuous integration

- simple design

While you can manage most types of risks effectively through your SDLC especially by following incremental, iterative techniques and disciplined engineering practices, there are two general classes of risk that require active and continuous risk discovery and explicit risk management, using tools and ideas such as the ones detailed in Waltzing with Bears, Steve McConnell’s Top 10 Risk List, and the formal methods and techniques taught by the Project Management Institute (this is where my PMP comes in handy):

1. Project risks outside of your team’s work in software development, but which directly impact your success. These include: sponsor and stakeholder issues and other political risks, larger business issues outside of your control, regulatory changes, reliance on delivery from partners and sub-contractors, implementation and integration with partners and customers.

2. Technical risks in the platform, architecture or design. This is especially important if you are building enterprise, high reliability systems such as a telco, banking systems, large e-commerce sites, or financial trading. Some lifecycle and SDLC factors help to mitigate technical risks, such as prioritizing work that is technically difficult in XP, or using exploratory prototyping not only for customer feedback but for technical proof of concept work. But to ensure that your product works in the real world, the team constantly needs to consider technical risk: difficult problems to solve; fragile or complex parts of the system; areas where you are pushing beyond your technical knowledge and experience. Using this information you can determine where to focus your attention in technical reviews and testing; what to try early and prove out; what to let soak; what you should have your best people work on.

If you are building software incrementally and carefully, you won’t have to “waltz with bears”, but you still need to continuously look for, and actively manage, risks inside and outside of the work that your team is doing.

Sunday, August 30, 2009

Lessons in Software Reliability

What does it take to build reliable and stable enterprise software?

First, stop writing lousy code

It’s unfortunate that few developers are familiar with The MITRE Corporation’s Common Weakness Enumeration list of common software problems. The CWE is a fascinating and valuable resource, not just to the software security community, but to the broader development community. Reading through the CWE, it is disappointing to see how many common problems in software, problems that lead to serious security vulnerabilities and other serious problems, are caused by sloppy coding: not missing the requirements, not getting the design wrong or messing up an interface, but simple, fundamental, stupid construction errors. The CWE is full of mistakes like: null pointers, missing initialization, resource leaks, string handling mistakes, arithmetic errors, bounds violations, bad error handling, leaving debugging code enabled, and language-specific and framework-specific errors and bad practices – not understanding, improperly using the frameworks and APIs. OK there are some more subtle problems too, especially concurrency problems, although we should reasonably expect developers by now to understand and follow the rules of multi-threading to avoid race conditions and deadlocks.

The solution to this class of problems are simple, although they require discipline:

- Hire good developers and give them enough time to do a good job, including time to review and refactor.

- Make sure the development team has training on the basics, that they understand the language and frameworks.

- Regular code reviews (or pair programming, if you’re into it) for correctness and safety.

- Use static analysis tools to find common coding mistakes and bug patterns.

Design for Failure

Failures will happen: make sure that your design anticipates and handles failures. Identify failures, contain, retry, recover, restart. Contain failures, ensure that failures don’t cascade. Fail safe. Look for the simplest HA design alternative: do you need enterprise-wide clustering or virtual synchrony-based messaging, or can you rely on simpler active/standby shadowing with fast failover?

Use design reviews to hunt down potential failures and look for ways to reduce the risk, prevent failure, or recover. Microsoft’s The Practical Guide to Defect Prevention, while academic at times, includes a good overview of Failure Modes and Effects Analysis (FMEA), a structured design review and risk discovery method similar to security threat modeling, focused on identifying potential causes of failures, then designing them out of the solution, or reducing their risk (impact or probability).

Cornell University’s College of Engineering also includes a course on risk management and failure modes analysis in its new online education program on Systems Approach to Product and Service Design.

Keep it Simple

Attack complexity: where possible, apply Occam’s Razor, and choose the simplest path in design or construction or implementation. Simplify your technology stack, collapse the stack, minimize the number of layers and servers.

Use static analysis to measure code complexity (cyclomatic complexity or others) and trending: is the code getting more or less complex over time. There is a correlation between complexity and quality (and security) problems. Identify code that is over-complex, look for ways to simplify it, and in the short term increase test coverage.

Test… test… test…

Testing for reliability goes beyond unit testing, functional and regression testing, integration, usability and UAT. You need to test everything you can every way you can think of or can afford to.

A key idea behind Software Reliability Engineering (SRE) is to identify the most important and most used scenarios for a product, and to test the system the way it is going to be used, as close as possible to real-life conditions: scale, configuration, data, workload and use patterns. This gives you a better chance of finding and fixing real problems.

One of the best investments that we made was building a reference test environment, as big as, and as close to the production deployment configuration, as we could afford. This allowed us to do representative system testing with production or production-like workloads, as well as variable load and stress testing, operations simulations and trials.

Stress testing is especially important: identifying the real performance limits of the system, pushing the system to, and beyond, design limits, looking for bottlenecks and saturation points, concurrency problems – race conditions and deadlocks – and observing failure of the system under load. Watching the system melt down under extreme load can give you insight into architecture, design and implementation weaknesses.

Other types of testing that are critical in building reliable software:

- Regression testing – relying especially on strong automated testing safety nets to ensure that changes can be made safely.

- Multi-user simulations – unstructured, or loosely structured group exploratory testing sessions.

- Failure handing and failover testing – creating controlled failure conditions and checking that failure detection and failure handling mechanisms work correctly.

- Soak testing (testing standard workloads for extended periods of time) and accelerated testing (playing at x times real-life load conditions) to see what breaks, what changes, and what leaks.

- Destructive testing – take the attacker’s perspective, purposefully set out to attack the system and cause exceptions and failures. Learn How to Break Software.

- Fuzz testing: simple, brute force automated attacks on interfaces, a testing technique that is valuable for reliability and security. Read Jonathan Kohl’s recent post on fuzz testing.

Get in the trenches with Ops

Get the development team, especially your senior technical leaders, working closely with operations staff: understanding operations' challenges, the risks that they face, the steps that they have to go through to get their jobs done. What information do they need to troubleshoot, to investigate problems? Are the error messages clear, are you logging enough useful information? How easy is it to startup, shutdown, recover and restart – the more steps, the more problems. Make it hard for operations to make mistakes: add checks and balances. Run through deployment, configuration and upgrades together: what seems straightforward in development may have problems in the real world.

Build in health checks – simple ways to determine that the system is in a healthy, consistent state, to be used before startup, after recovery / restart, after an upgrade. Make sure operations has visibility into system state, instrumentation, logs, alerts – make sure ops know what is going on and why.

When you encounter a failure in production, work together with the operations team to complete a Root Cause Analysis, a structured investigation where the team searches for direct and contributing factors to the failure, defines corrective and preventative actions. Dig deep, look past immediate causes, keep asking why. Ask: how did this get past your checks and reviews and testing? What needs to be changed in the product? In the way that it is developed? In the way that is implemented? Operated?

And ensure that you followup on your corrective action plan. A properly managed RCA is a powerful tool for organizational learning and improvement: it forces you to think, to work together, creates a sense of accountability and transparency.

Change is bad…. but change is good

You don’t need to become an expert in ITIL, but if you have anything to do with developing or supporting enterprise software, at least spend a day reading Visible Ops. This brief overview of IT operations management explains how to get control over your operations environment. The key messages are:

Poor change management is the single leading cause of failures: 80% of IT system outages are caused by bad changes by operations staff or developers. 80% of recovery time (MTTR) is spent determining what changed.

The corollary: control over change not only improves reliability, it makes the system cheaper to operate, and more secure.

Change can be good: as long as changes are incremental, controlled, carefully managed and supported by good tools and practices. When the scope of change is contained, it is easier to get your head around, review and test. And with frequent change, everyone knows the drill – the team understands the problems and is better prepared if any problems come up.

Implement change control and release management practices. Include backout planning, rollback planning and testing. Taking compatibility into account in your design and implementation. Create checklists, reviews.

Safety First

Reliable software, like secure software, doesn’t come for free, especially up front, when you need to effect changes, put in more controls. You must have management, and customer, support. You need to change the team’s way of thinking: to use risk management to drive priorities, shape design and implementation and planning. Get your best people to understand and commit: the rest will follow.

Keep in mind of course that there are limits, that tradeoffs need to be made: most of us are not building software for the space shuttle. In Software Quality at Top Speed, Steve McConnell shows that development teams that build better quality, more reliable software actually deliver faster, up to a peak efficiency of 95% defects removed before production release. However, you reach a point of rapidly diminishing returns as you approach the end of the curve, attempting to hit 100% defect-free software, where costs and schedule increase significantly.

Timeboxing is an effective technique to contain scope and cost: do as much as you can, as good as you can, within a hard time limit. Following Japanese manufacturing principles, make sure that anyone on the team can pull the cord and postpone a release or cancel a feature because it is unstable.

It is sobering, almost frightening, how easy it is, how natural it is, for developers and managers to short-change quality practices, to place feature delivery ahead of reliability, especially under pressure. Ensure that you build support across the organization, build a culture that puts reliability first. Like any change, it will require patience, commitment, and unrelenting followup.

Thursday, June 25, 2009

The Value of Static Analysis Tools

Just how effective is static analysis, what does it protect you from?

There is a lot of attention given to static analysis tools, especially from the software security community - and some serious venture capital money being thrown at static analysis tool providers such as Coverity.

The emphasis on using static analysis tools started with Cigital's CTO Gary McGraw in his definitive book on Software Security: Building Security In. In a recent interview with Jim Manico for OWASP (Jan 2009), Dr. McGraw went so far as to say that

“My belief is that everybody should be using static analysis tools today. And if you are not using them, then basically you are negligent, and you should prepare to be sued by the army of lawyers that have already hit the beach”.

Statements like this, from a thought leader in the software security community, certainly encourage companies to spend more money on static analysis tools, and of course should help Cigital’s ownership position in leading static analysis tool provider Fortify Software, which Cigital helped to found.

You can learn more about the important role that static analysis plays in building secure systems from Brian Chess, CTO of Fortify, in his book Secure Programming with Static Analysis.

Secure software development maturity models like SAMM and BSIMM emphasize the importance of code reviews to find bugs and vulnerabilities, but especially the use of static analysis tools, and OWASP has a number of free tools and projects in this area.

Now even Gartner has become interested in this the emerging emerging static analysis marketand its players – evidence that the hype is reaching, or has reached, a critical point. In Gartner’s study of what they call Static Application Security Testing (SAST) suppliers (available from Fortify), they state that

“...enterprises must adopt SAST technology and processes because the need is strategic. Enterprises should use a short-term, tactical approach to vendor selection and contract negotiation due to the relative immaturity of the market.” Well there you have it: whether the products are ready or not, you need to buy them.

Gartner’s analysis puts an emphasis on full-service offerings and suites: principally, I suppose, because CIOs at larger companies, who are Gartner’s customers, don’t want to spend a lot of time finding the best technology and prefer to work with broad solutions from strategic technology partners, like IBM or HP (neither of which has strong static analysis technology yet, so watch for acquisitions of the independents to fill out their security tool portfolios, as they did in the dynamic analysis space). Unfortunately, this has led vendors like Coverity to spend their time and money on filling out a larger ALM portfolio, building and buying in technology for build verification, software readiness (I still don’t understand who would use this) and architecture analysis rather than investing in their core static analysis technology. On their site, Coverity proudly references a recent story "Coverity: a new Mercury Interactive in the making?", which should make their investors happy and their customers nervous – as a former Mercury Interactive, now HP customer, I can attest that while the acquisition of Mercury by HP may have been good for HP and good for Mercury’s investors, it was not good for Mercury’s customers, at least the smaller ones.

The driver behind static analysis is its efficiency: you buy a tool, you run it, it scans thousands and thousands of lines of code and finds problems, you fix the problems, now you’re secure. Sounds good, right?

But how effective is static analysis? Do these tools find real security problems?

We have had success with static analysis tools, but it hasn’t been easy. Starting in 2006, some of our senior developers started working with FindBugs (we’re a Java shop) because it was free, it was easy to get started with, and it found some interesting, and real, problems right away. After getting to understand the tool and how it worked, some cleanup and a fair amount of time invested by a smart, diligent and senior engineer to investigate false positives and setup filters on some of the checkers, we added FindBugs checking to our automated build process, and it continues to be our first line of defense in static analysis. The developers are used to checking the results of the FindBugs analysis daily, and we take all of the warnings that it reports seriously.

Later in 2006, as part of our work with Cigital to build a software security roadmap, we conducted a bake-off of static analysis tool vendors including Fortify, Coverity, Klocwork (who would probably get more business if they didn't have a cutesy name that is so hard to remember), and a leading Java development tools provider whose pre-sales support was so irredeemably pathetic that we could not get their product installed, never mind working. We did not include Ounce Labs at the time because of pricing, and because we ran out of gas, although I understand that they have a strong product.

As the NIST SAMATE study confirms, working with different tool vendors is a confusing and challenging and time-consuming process: the engines work differently, which is good since they catch different types of problems, but there is no consistency in the way that warnings are reported or rated, and different terms are used by different vendors to describe the same problem. And there is the significant problem of dealing with noise: handling the large number of false positives that get reported by all of the tools (some are better than others), understanding what to take seriously.

At the time of our initial evaluation, some of the tools were immature, especially the C/C++ tools that were being extended into the Java code checking space (Coverity and Klocwork). Fortify was the most professional and prepared of the suppliers. However, we were not able to take advantage of Fortify’s data flow and control flow analysis (one of the tool’s most powerful analysis capabilities) because of some characteristics of our software architecture. We verified with Fortify and Cigital consultants that it was not possible to take advantage of the tool’s flow analysis, even with custom rules extensions, without fundamentally changing our code. This left us with relying on the tool’s simpler security pattern analysis checkers, which did not uncover any material vulnerabilities. We decided that with these limitations, the investment in the tool was not justified.

Coverity’s Java checkers were also limited at that time. However, by mid 2007 they had improved the coverage and accuracy of their checkers, especially for security issues and race conditions checking, and generally improved the quality of their Java analysis product. We purchased a license for Coverity Prevent, and over a few months worked our way through the same process of learning the tool, reviewing and suppressing false positives, and integrating it into our build process. We also evaluated an early release of Coverity’s Dynamic Thread Analysis tool for Java: unfortunately the trial failed, as the product was not stable – however, it has potential, and we will consider looking at it again in the future when it matures.

Some of the developers use Klocwork Developer for Java, now re-branded as Klocwork Solo, on a more ad hoc basis: for small teams, the price is attractive, and it comes integrated into Eclipse.

In our build process we have other static analysis checks, including code complexity checking and other metric trend analysis using an open source tool JavaNCSS to help identify complex (and therefore high high-risk) sections of code, and we have built proprietary package dependency analysis checks into our build to prevent violation of dependency rules. One of our senior developers has now started working with Structure101 to help us get a better understanding of our code and package structure and how it is changing over time. And other developers use PMD to help cleanup code, and the static analysis checkers included in IntelliJ.

Each tool takes different approaches and has different strengths, and we have seen some benefits in using more than one tool as part of a defense-in-depth approach. While we find by far the most significant issues in manual code reviews or exploratory testing or through our automated regression safety net, the static analysis tools have been helpful in finding real problems.

While FindBugs does only “simple, shallow analysis of network security vulnerabilities", and analysis of malicious code vulnerabilities as security checks, it is good at finding small, stupid coding mistakes that escape other checks, and the engine continues to improve over time. This open source project deserves more credit: it offers incredible value to the Java development community, and anyone building code in Java that who does not take advantage of it is a fool.

Coverity reports generally few false positives, and is especially good for finding potential thread safety problems and null pointer (null return and forward null) conditions. It also comes with a good management infrastructure for trend analysis and review of findings. Klocwork is the most excitable and noisiest of all of our tools, but it includes some interesting checkers that are not available in the other tools – although after manual code reviews and checks by the other static analysis tools, there is rarely anything of significance left for it to consider.

But more than the problems that the tools find directly, the tools help to identify areas where we may need to look deeper: where the code is complex, or too smarty pants fancy, or otherwise smelly, and requires followup review. In our experience, if a mature analysis tool like FindBugs reports warnings that don’t make sense, it is often because it is confused by the code, which in turn is a sign that the code needs to be cleaned up. We have also seen the number of warnings reported decline over time as developers react to the “nanny effect” of the tools’ warnings, and change and improve their coding practices to avoid being nagged. And the final benefit of using these tools is that this frees up the developers to concentrate on higher-value work in their code reviews: they don’t have to spend so much time looking out for fussy, low-level coding mistakes, because the tools have found them already, so the developers can concentrate on more important and more fundamental issues like correctness, proper input validation and error handling, optimization, simplicity and maintainability.

While we are happy with the toolset we have in place today, I sometimes wonder whether we should beef up our tool-based code checking. But is it worth it?

In a presentation at this year’s JavaOne conference, Prof. Bill Pugh, the Father of FindBugs says that

“static analysis, at best, might catch 5-10% of your software quality problems.”

He goes on to say, however, that static analysis is 80+% effective at finding specific defects and cheaper than other techniques for catching these same defects – silent, nasty bugs and programming mistakes.

Prof. Pugh emphasizes that static analysis tools have value in a defense-in-depth strategy for quality and security, combined with other techniques; that “each technique is more efficient at finding some mistakes than others”; and that “each technique is subject to diminishing returns”.

In his opinion, “testing is far more valuable than static analysis”, and “FindBugs might be more useful as an untested code detector than a bug detector”. If FindBugs finds a bug, you have to ask: “Did anyone test that code”?”. In our experience, Prof Pugh’s FindBugs findings can be applied to the other static analysis tools as well.

As technologists, we are susceptible to the belief that technology can solve our problems – the classic “silver bullet” problem. When it comes to static analysis tools, you’d be foolish not to use a tool at all, but at the same time you’d be foolish to expect too much from them – or pay too much for them.

Wednesday, June 17, 2009

How long can this go on?

Our team delivers software iteratively and incrementally, and over the past 3 years we have experimented with longer (1-2 months) and shorter (1-2 week) iterations, adjusting to circumstances, looking for the proper balance between cost and control.

There are obvious costs in managing an iteration: startup activities (planning, prioritization, kickoff, securing the team's commitment to the goals of the release), technical overheads like source code and build management (branching and merging and associated controls), status reporting to stakeholders, end-to-end system and integration testing, and closure activities (retrospectives, resetting). We don’t just deliver “ship-quality” software at the end of an iteration: in almost every case we go all the way to releasing the code to production, so our costs also include packaging, change control, release management, security and operations reviews, documentation updates and release notes and training, certifications with partners, data conversion, rollback testing, and pre- and post-implementation operations support. Yep, that’s a lot of work.

All of these costs are balanced against control: our ability to manage and contain risks to the project, to the product, and to the organization. I explored how to manage risks through iterative, incremental development in an earlier post on risk management.

We’ve found that if an iteration is too long (a month or more), it is hard to defend the team from changes to priorities, to prevent new requirements from coming in and disrupting the team’s focus. And in a longer cycle, there are too many changes and fixes that need to be reviewed and tested, increasing the chance of mistakes or oversights or regressions.

Shorter releases are easier to manage because, of course, they are necessarily smaller. We can manage the pressure from the business-side for changes because of the fast delivery cycle (except for emergency hot fixes, we are usually able to convince our product owner, and ourselves, to wait for the next sprint since it is only a couple of weeks away) and it is easier for everyone to get their heads around what was changed in a release and how to verify it. And shorter cycles keep us closer to our customers, not only giving us faster feedback, but demonstrating constant value. I like to think of it as a “value pipeline”, continuously streaming business value to customers.

One of my favorite books on software project management, Johanna Rothman’s Manage It!, recommends making increments smaller to get faster feedback – feedback not just on the product, but on how you build it and how you can improve. The smaller the iteration, the easier to look at it from beginning-to-end and see where time is wasted, what works, what doesn’t, where time is being spent that isn’t expected.

“Shorter timeboxes will make the problems more obvious so you can solve them.”

Ms. Rothman recommends using the “Divide-by-Two Approach to Reduce Iteration Size”: if the iterations aren’t succeeding, divide the length in half, so 6 weeks becomes 3 weeks, and so on. Smaller iterations provide feedback - longer ones mask the problems.

Ms. Rothman also says that it is difficult to establish a rhythm for the team if iterations are too long. In “Selecting the Right Iteration Length for Your Software Development Process”, Mike Cohn of Mountain Goat Software examines the importance of establishing a rhythm in incremental development. He talks about the need for a sense of urgency: if an iteration is too long, it takes too much time for the team to “warm up” and take things seriously. Of course, this needs to be balanced against keeping the team in a constant state of emergency, and burning everyone out.

Some of the other factors that Mr. Cohn finds important in choosing an iteration length:
- how long can you go without introducing change – avoiding requirements churn during an iteration.
- if cycles are too short (for example, a week) small issues, like a key team member coming down with a cold, can throw the team’s rhythm off and impact delivery.

All of this supports our experience: shorter (but not too-short) cycles help establish a rhythm and build the team’s focus and commitment, constantly driving to delivering customer value. And shorter cycles help manage change and risk.

Now we are experimenting with an aggressive, fast-tracked delivery model: a 3-week end-to-end cycle, with software delivered to production every 2 weeks. The team starts work on designing and building the next release while the current release is in integration, packaging and rollout, overlapping development and release activities. Fast-tracking is difficult to manage, and can add risk if not done properly. But it does allow us to respond quickly to changing business demands and priorities, while giving us time for an intensive but efficient testing and release management process.

We'll review how this approach works over the next few months and change it as necessary, but we intend to continue with short increments. However, I am concerned about the longer-term risks, the potential future downsides to our rapid delivery model.

In The Decline and Fall of Agile James Shore argues that rapid cycling short cuts up-front design:

“Up-front design doesn't work when you're using short cycles, and Scrum doesn't provide a replacement. Without continuous, incremental design, Scrum teams quickly dig themselves a gigantic hole of technical debt. Two or three years later, I get a call--or one of my colleagues does. "Changes take too long and cost too much!" I hear. "Teach us about test-driven development, or pairing, or acceptance testing!" By that time, fixing the real problems requires paying back a lot of technical debt, and could take years.”

While Mr. Shore is specifically concerned about loose implementations of Scrum, and its lack of engineering practices compared with other approaches like XP (see also Martin Fowler of ThoughtWorks on the risks of incremental development without strong engineering discipline), the problem is a general one for teams working quickly, in short iterations: even with good engineering discipline, rapid cycling does not leave a lot of time for architecture, design and design reviews, test planning, security reviews... all of those quality gating activities that waterfall methods support. This is a challenge for secure software development, as there is little guidance available on effectively scaling software security SDLC practices to incremental, agile development methods, something that I will explore more later.

Trying to account for architecture and platform decisions and tooling and training in an upfront “iteration zero” isn’t enough, especially if your project is still going strong after 2 or 3 years. What I worry about (and I worry about a lot of things) is that, moving rapidly from sprint to sprint, the team cannot stop and look at the big picture, to properly re-assess architecture and platform technology decisions made earlier. Instead all the team has a chance to do is make incremental, smaller-scale improvements (tighten up the code here, clean up an interface there, upgrade some of the technology stack), which may leave fundamental questions unanswered, trading off short-term goals (deliver value, minimize the cost and risk of change) with longer-term costs and uncertainties.

One of the other factors that could affect quality in the longer term is the pressure on the team to deliver in a timebox. In Technical Debt: Warning Signs, Catherine Powell raises the concern that developers committing to a date may put schedule ahead of quality:

“Once you've committed to a release date and a feature set, it can be hard to change. And to change it because you really want to put a button on one more screen? Not likely. The "we have to ship on X because X is the date" mentality is very common (and rightly so - you can't be late forever because you're chasing perfection). However, to meet that date you're likely to cut corners, especially if you've underestimated how much time the feature really takes, or how much other stuff is going on.”

Finally, I am concerned that rapid cycling does not give the team sufficient opportunities to pause, to take a breath, to properly reset. If they are constantly moving heads down from one iteration to another, do team members really have a chance to reflect, understand and learn? One of the reasons that I maintain this blog is exactly for this: to explore problems and questions that my team and I face; to research, to look far back and far ahead, without having to focus on the goals and priorities of the next sprint.

These concerns, and others, are explored in Traps & Pitfalls of Agile Development - a Non-Contrarian View:

"Agile teams may be prone to rapid accumulation of technical debt. The accrual of technical debt can occur in a variety of ways. In a rush to completion, Iterative development is left out. Pieces get built (Incremental development) but rarely reworked. Design gets left out, possibly as a backlash to BDUF. In a rush to get started building software, sometimes preliminary design work is insufficient. Possibly too much hope is placed in refactoring. Refactoring gets left out. Refactoring is another form of rework that often is ignored in the rush to complete. In summary, the team may move too fast for it's own good."

Our team’s challenge is not just to deliver software quickly: like other teams that follow these practices, we’ve proven that we can do that. Our challenge is to deliver value consistently, at an extremely high level of quality and reliability, on a continual and sustainable basis. Each design and implementation decision has to be made carefully: if your customer's business depends on you making changes quickly and perfectly, without impacting their day-to-day operations, how much risk can you afford to take on to make changes today so that the system may be simpler and easier to change tomorrow, especially in today's business environment? It is a high stakes game we're playing. I recognize that this is a problem of debt management, and I'll explore the problems of technical debt and design debt more later.

The practices that we have followed have worked well for us so far. But is there a point where rapid development cycles, even when following good engineering practices, provide diminishing returns? When does the accumulation of design decisions made under time pressure, and conscious decisions to minimize the risk of change, add up to bigger problems? Does developing software in short cycles, with a short decision-making horizon, necessarily result in long-term debt?

Friday, May 8, 2009

OWASP SAMM Organizational Assessment

I recently completed a lightweight organizational assessment of our software security assurance program, using the OWASP SAMM assessment worksheets. This is the text of a message that I posted to the SAMM mailing list, summarizing my findings on working with the assessment model:

Completing the assessment was straightforward. I didn’t have problems understanding what was being asked or why; although there were a few questions for some Level 3 practices that were targeted to enterprise-level concerns (costing, central control, some of the auditing requirements, metrics) that I did not go into much, working from a small company perspective. Ok, I did have to look up "Likert" to verify what the metric ">3.0 Likert on usefulness of code review activities..." was measuring in CR1.

The assessment guided me in a structured way through a review of our application security assurance program, what we are doing well, where we need to invest more. Going through the assessment was also a good way to improve my understanding of the SAMM model, the practice areas and maturity ladder.

The findings from the assessment were generally in line with what we had already identified through our ad hoc internal analysis and our existing roadmap, at least in the areas of Construction and Verification practices.

I like that SAMM considers more than Construction and Verification, that it also addresses Governance (including training) in a simple and practical way (at least for level 1 and 2), as well as Deployment practices (vulnerability management, environment hardening, operational enablement), so I was able to take into account at least some of the requirements of operations without having to put on a different hat (ITIL, Visible Ops, SANS, and so on.). This should help teams work on the boundaries where software assurance ends and secure infrastructure management and operations begins, rather than throwing the problem over the fence to operations / your customer to take care of.

I also like that Vulnerability Management is considered an important part of the software assurance model. It makes sense to me that a consistent mechanism should be put into place for handling security risks, vulnerabilities and incidents, including incident management and root cause analysis, within the software development / project team structure; aligning this with what is done for secure deployment and operations of the system, and within the larger context of business risk management.

Using the assessment as a planning tool was also simple: building the assessment score card from the worksheets; identifying gaps and areas for development, and then referring to the roadmaps for examples, ideas where to prioritize effort. The sample roadmaps provided good coverage, at least for my purposes (building software for financial services customers). There are some good guidelines on handling outsourced scenarios in the roadmaps. It would be helpful if there were also guidelines on how to apply SAMM to agile development, for teams that follow lighter-weight methods and practices, maybe as part of the ISV roadmap. The case study example was also helpful.

The next challenge is hunting down resources from all of the good stuff available from OWASP to support improvement activities. I am happy to see that work has started to make this easier. I can see that there will be some interesting challenges in this. For example, Threat Assessment: SAMM TA1 recommends attack trees as a less formal alternative to STRIDE, Trike, etc., and provides a brief description of attack trees. Now I can google “attack trees” and find some helpful information out there in the wide world. But there is not much available in OWASP itself on attack trees, except for a note from some poor guy unsuccessfully searching for information on attack trees in CLASP at one time, and a thread in 2007 where Dinis Cruz said that Microsoft gave up on using attack trees in threat modelling because of complexity and difficulty reading them. Whoa pardner, this isn’t what I was hoping to find :-(

One of the keys to me, as a software development guy, not a security expert, is that I need clear and actionable guidance where possible. I have a lot of work to do. I really don’t want to get in the middle of the semantic war around Microsoft’s definition of Threat Modelling vs what other security experts define as Threat Modelling. And then there’s Attack Models and Architectural Risk Analysis (which don’t show up in SAMM, as far as I can see – indexing SAMM would help! - but of course are key to BSIMM).

Mapping resources from OWASP to SAMM practices in a consistent way will make a hard job, building secure software, much easier.