Michael Thompson, Director of Disaster Recovery & Business Continuity, Koch Business Solutions
Whenever I give a presentation on resiliency, I always start by asking the audience what resiliency is. I always receive a pretty diverse response. Coming from a Control Systems Engineering background resiliency has been a cornerstone of practically every system I have ever designed, from nuclear power systems to wastewater treatment. Quite often I hear resiliency analogously related to armor as the attendees say, we must strengthen this, or add another layer of protection here, and I always use this as a segue to the following story.
During World War II the Allied forces were confronted with a big problem. The bombers they were sending out for combat missions were getting shot down or coming back torn to shreds by the German anti-aircraft weapons. It was clear that something needed to be done, and it was decided that the solution was to add more armor.
Then came the dilemma: you couldn’t simply add more armor to the entire plane. There was a delicate balance between the amount of armor you could place on an aircraft and its ability to remain airborne for extended periods of time. More armor equated to more weight, more weight equated to a decrease in the aircrafts’ flight duration and speed. So the question became, if we can’t armor the entire plane, where do we strategically add the armor?
The Center for Naval Analyses (Mangel, 1981) was conducting various studies and was painstakingly recording all battle damage on returning aircraft. They plotted the aggregated damage on a model such as the one below and it became evident to them very quickly that the enemy fire seemed to hit the fuselage, wing tips, and elevator disproportionally more than the other parts of the plane.
When making decisions regarding redundancy, understand your tolerable risk
It was decided that the armor should be added to those respective areas.
Someone spoke up and said that perhaps it would be a good idea to get some input from the Statistical Research Group (SRG). The SRG was a highly classified program akin to the Manhattan Project, but instead of making weapons they used equations to increase the effectiveness of the military machine. Jacob Wolfowitz for example concluded the special mixtures of ammunition favored by the ordinance sergeants were in fact not nearly as effective as an entire belt of Armor Piercing Incendiary rounds.
One of the young mathematicians that had joined the effort was a man by the name of Abraham Wald. Mr.Wald was already starting to get recognized for his contributions to the war effort, due to his ability to approach a problem differently. It was a popular running joke at the time that Mr. Wald’s work was so highly classified that the secretaries were under strict orders to snatch the paper out of his hand as soon as he had finished writing on it.
Mr. Wald was brought in to give his opinion on the data collected. The other researchers believed that this was an extra unnecessary step as the data shows where we should add the armor. What could this young man possibly add to the very apparent conclusion? Well, what happened next was the stuff of legend. Mr. Wald looked at the information, went to the diagram, and stated that they should add the armor to the spots where there were no bullet holes, specifically the engines and the cockpit.
The researchers scoffed, after all, what sense does that make? There were no holes in those areas. Mr. Wald pointed out a significant detail that the researchers had overlooked; they were making their decision based solely on the aircraft that made it back. Mr. Wald postulated that the “missing bullet holes” were in the engine and cockpit area of the planes that did not make it back. The places where there were holes on the returning planes indicated that the plane could sustain damage in those areas and make it back to base. The “missing bullet holes” area represented critical spots that if hit, resulted in a catastrophic failure and therefore the plane did not return. Thousands of airmen owe a debt of gratitude to a second opinion. This story became a textbook example of the cognitive bias known as survivorship bias.
Today, we cannot make our systems more resilient by simply armoring everything. Economic and operational constraints do not allow for it. One could even argue that adding unnecessary redundancy adds to the complexity and can inject risk; think Charles Perrow’s theory of systems or normal accidents. While making decisions regarding redundancy, understand your tolerable risk. Understand your susceptibility to cognitive biases and decide based on critical thinking. Don’t be afraid to get a second opinion and remember to always think about where you are putting your armor.