Building Real Software: DevOpsDays: Empathy, Scaling, Docker, Dependencies and Secrets

Last week I attended DevOpsDays 2016 in Vancouver. I was impressed to see how strong the DevOps community has grown from the time that I attended my first DevOpsDays event in Mountain View in 2012. There were more than 350 attendees, all of them doing interesting and important work.

Here are the main themes that I followed at this conference:

Empathy – Humanizing Engineering and Ops

There was a strong thread running through the conference on the importance of the human side of engineering and operations, understanding and empathizing with people across the organization. There were two presentations specifically on empathy: one from an engineering perspective by Joyent’s Matthew Smillie, and another excellent presentation on the neuroscience of empathy by Dave Mangot at Librato, which explained how we are all built for empathy and that it is core to our survival. There was also a presentation on gender issues, and several breakout sessions on dealing with people issues and bringing new people into DevOps.

Another side to this was how we use tools to collaborate and build connections between people. More people are depending more on – and doing more with – chat systems like HipChat and Slack to do ChatOps. Using chat as a general interface to other tools, leveraging bots like Hubot to automatically trigger and guide actions, such as tracking releases and handling incidents.

In some organizations, standups are being replaced with Chatups, as people continue to find new ways to engage and connect with other people working remotely and inside and outside of teams.

Scaling DevOps

All kinds of organizations are dealing with scaling problems in DevOps.

Scaling their organizations. Dealing with DevOps at the extremes, at really large organizations and figuring out how to effectively do DevOps in small teams.

Scaling Continuous Delivery. Everyone is trying to push out more changes, faster and more often in order to reduce risk (by reducing the batch size of changes), increase engagement (for users and developers), and improve the quality of feedback. Some organizations are already reaching the point where they need to manage hundreds or thousands of pipelines, or optimize single pipelines shared by hundreds of engineers, building and shipping out changes (or newly baked containers) several times a day to many different environments.

A common story for CD as organizations scale up goes something like this:

Start out building a CD capability in an ad hoc way, using Jenkins and adding some plugins and writing custom scripts. Keep going until it can’t keep up.
Then buy and install a commercial enterprise CD toolset, transition over and run until it can’t keep up.
Finally, build your own custom CD server and move your build and test fleet to the cloud and keep going until your finance department shouts at you.

Scaling testing. Coming up with effective strategies for test automation where it adds most value – in unit testing (at the bottom of the test pyramid), and end-to-end system testing (at the top of the pyramid). Deciding where to invest your time. Understanding the tools and how to use them. What kind of tests are worth writing, and worth maintaining.

Scaling architecture. Which means more and more experiments with microservices.

Docker, Docker, Docker

Docker is everywhere. In pilots. In development environments. In test environments especially. And more often now, in production. Working with Docker, problems with Docker, and questions about Docker came up in many presentations, break outs and hallway discussions.

Docker is creating new problems at the start and end of the CD pipeline.

First, it moves configuration management upfront into the build step. Every change to the application or change to the stack that it is built and runs on requires you to “bake a new cake” (Diogenes Rettori at Openshift) and build up and ship out a new container. This places heavy demands on your build environment. You need to find effective and efficient ways to manage all of the layers in your containers, caching dependencies and images to make builds run fast.

Docker is also presenting new challenges at the production end. How do you track and manage and monitor clusters of containers as the application scales out? Kubernetes seems to be the tool of choice here.

Depending on Dependencies

More attention is turning to builds and dependency management, managing third party and open source dependencies. Identifying, streamlining and securing these dependencies.

Not just your applications and their direct dependencies – but all of the nested dependencies in all of the layers below (the software that your software depends on, and the software that this software depends on, and so on and so on). Especially for teams working with heavy stacks like Java.

There was a lot of discussion on the importance of tracking dependencies and managing your own dependency repositories, using tools like Archiva, Artifactory or Nexus, and private Docker registries. And stripping back unnecessary dependencies to reduce the attack surface and run-time footprint of VMs and containers. One organization does this by continuously cutting down build dependencies and spinning up test environments in Vagrant until things break.

Docker introduces some new challenges, by making dependency management seem simpler and more convenient, and giving developers more control over application dependencies – which is good for them, but not always good for security:

Containers are too fat by default - they include generic platform dependencies that you don’t need and - if you leave this up to developers - developer tools that you don’t want to have in production.
Containers are shipped with all of the dependencies baked in. Which means that as containers are put together and shipped around, you need to keep track of what versions of what images were built with what versions of what dependencies and when, where they have been shipped to, and what vulnerabilities need to be fixed.
Docker makes it easy to pull down pre-built images from public registries. Which means it is also easy to pull images that are out of date or that could contain malware.

You need to find a way to manage these risks without getting in the way and slowing down delivery. Container security tools like Twistlock can scan for vulnerabilities, provide visibility into run-time security risks, and enforce policies.

Keeping Secrets Secret

Docker, CD tooling, automated configuration management tools like Chef and Puppet and Ansible and other automated tooling create another set of challenges for ops and security: how to keep the credentials, keys and other secrets that these tools need safe. Keeping them out of code and scripts, out of configuration files, and out of environment variables.

This needs to be handled through code reviews, access control, encryption, auditing, frequent key rotation, and by using a secrets manager like Hashicorp’s Vault.

Passion, Patterns and Problems

I met a lot of interesting, smart people at this conference. I experienced a lot of sincere commitment and passion, excitement and energy. I learned about some cool ideas, new tools to use and patterns to follow (or to avoid).

And new problems that need to be solved.

2 comments:

Cengiz Kayay said...: In our enterprise, there are generic data services used by several departments. Each department asks for enhancements on these services.
There are times that the enhancements are finalised within the same week and to be released in that weeks release train.
But each department(user) wants to test it all to make sure the enhancement from other department did not break the service.
Thus, until both department's own QA agree on the results, the service can't be released. (Thus creating inter department decencies).
How to resolve this issue?
They ask me to split the service so that each channel has its own service to maintain.
But that is wrong approach as this will not be a generic micro service but project level service per user/customer. One option is to make one department to wait for the next release but that also delays the time to market.

Is there any options other than using feature toggles? I think, toggles are difficult to maintain...
Thanks...; July 27, 2017 at 12:58 PM
@philn5d said...: It might not be so wrong to split it if each channel has different needs. This sounds like a classic case of trying to maintain a global domain model. It does depend, but you are in an enterprise environment rather than producing for public consumption. I've found that different departments in an enterprise have different definitions and completely different use cases/logic for the same entity (Employee for example). Perhaps you can make channel services that extend existing services when it becomes so much of a problem - better than fighting to maintain a globally unified model with irrelevant data and properties on it. Hope that helps ease the pain :); September 29, 2017 at 3:36 PM