Sunday, May 1, 2016

Trusting, Trusting Trust

A long time ago Ken Thompson wrote something called Reflections on Trusting Trust. If you've never read this, go read it right now. It's short and it's something everyone needs to understand. The paper basically explains how Ken backdoored the compiler on a UNIX system in such a way it was extremely hard to get rid of the backdoors (yes, more than one). His conclusion was you can only trust code you wrote. Given the nature of the world today, that's no longer an option.

Every now and then I have someone ask me about Debian's Reproducible Builds. There are other groups working on similar things, but these guys seem to be the furthest along. I want to make clear right away that this work being done is really cool and super important, but not exactly for the reasons people assume. The Debian page is good about explaining what's going on but I think it's easy to jump to some false conclusions on this one.

Firstly, the point of a reproducible build is to allow two different systems to build the exact same binary. This tells us that the resulting binary was not tampered with. It does not tell us the compiler is trustworthy or the thing we built is trustworthy. Just that the system used to build it was clean and the binary wasn't meddled with before it got to you.

A lot of people assume a reproducible build means there can't be a backdoor in the binary. There can due to how the supply chain works. Let's break this down into a few stages. In the universe of software creation and distribution there are literally thousands to millions of steps happening. From each commit, to releases, to builds, to consumption. It's pretty wild. We'll keep it high level.

Here are the places I will talk about. Each one of these could be a book, but I'll keep it short on purpose.
  1. Development: Creation of the code in question
  2. Release: Sending the code out into the world
  3. Build: Turning the code into a binary
  4. Compose: Including the binary in some larger project
  5. Consumption: Using the binary to do something useful
Development
The development stage of anything is possibly the hardest to control. We have reached a point in how we build software that development is now really fast. I would expect any healthy project to have hundreds or thousands of commits every day. Even with code reviews and sign offs, bugs can sneak in. A properly managed project will catch egregious attempts to insert a backdoor.

Release
This is the stage where the project in question cuts a release and puts it somewhere it can be downloaded. A good project will include a detached signature which almost nobody checks. This stage of the trust chain has been attacked in the past. There are many instances of hacked mirrors serving up backdoored content. The detached signature ensures the release is trustworthy. We mostly have trust here solved which is why those signatures are so important.

Build
This is the stage where we take the source code and turn it into a binary. This the step that a reproducible build project has injected trust into. Without a reproducible build stage, there was no real trust here. It's still sort of complicated though. If you've ever looked at the rules that trigger these builds, it wouldn't be very hard to violate trust there, so it's not bullet proof. It is a step in the right direction though.

Compose
This step is where we put a bunch of binaries together to make something useful. It's pretty rare for a single build to output the end result. I won't say it never happens, but it's a bit outside what we're worried about, so let's not dwell on it. The threat we see during this stage is the various libraries you bundle with your application. Do you know where they came from? Do they have some level of trust built in? At this point you could have a totally trustworthy chain of trust, but if you include a single bad library, it can undo everything. If you want to be as diligent as possible you won't ship things built by any 3rd parties. If you build it all yourself, you can ensure some level of trust up to this point then. Of course building everything yourself generally isn't practical. I think this is the next stage that we'll end up adding more trust. Various code scanners are trying to help here.

Consumption
Here is where whatever you put together is used. In general nobody is looking for software, they want a solution to a problem they have. This stage can be the most complex and dangerous though. Even if you have done everything perfectly up to here, if whoever does the deployment makes a mistake it can open up substantial security problems. Better management tools can help this step a lot.

The point of this article isn't to try to scare anyone (even though it is pretty scary if you really think about it). The real point to this is to stress nobody can do this alone. There was once a time a single group could plausibly try to own their entire development stack, those times are long gone now though. What you need to do is look a the above steps and decide where you want to draw your line. Do you have a supplier you can trust all the way to consumption? Do you only trust them for development and release? If you can't draw that line, you shouldn't be using that supplier. In most cases you have to draw the line at compose. If you don't trust what your supplier does beneath that stage, you need a new supplier. Demanding they give you reproducible builds isn't going to help you, they could backdoor things during development or release. It's the old saying: Turst, but verify.

Let me know what you think. I'm @joshbressers on Twitter.