“Fail-Safe” – When failure does NOT mean disaster

Robert Williams, spokesman for Entergy, said the four monitors were “showing spurious signals, but they were still able to perform their safety function.”

He added, “To be perfectly clear, the monitors did not fail. They did not fail; they were generating false signals.”

via Radiation monitor problem not “failure’ Entergy says  : Times Argus Online.

This is good news, to be sure. However, I think that this statement begs to be ridiculed, and that Williams could have stated the truth much better. The fact is that the sensors produced signals that were not accurate; it’s just that this malfunction (or, indeed, failure) caused the system to assume a safe-state, rather than leading to a wider system failure, or perhaps worse, a sense of false security. In this case, the “false sense” was one of alarm — sensors showed a high level of radiation that was not, in fact, found to be present. Far better this outcome than had these sensors not indicated high radiation levels that were present, due to their inability to detect it. But they still, frankly, failed to perform their primary function.

This incident could have been used as an excellent “teachable moment” regarding the real meaning of the term “fail-safe.” This term is widely misunderstood to mean “safe against failure,” and often used mistakenly in place of “fool-proof.” It means nothing of the sort. “Fail-safe” describes a careful and deliberate design characteristic in which the failure of a component or portion of a system will cause the system to make a transition into a safe state. In other words, “fail-safe” design is the application of the old proverb, “better safe than sorry,” to the practice of systems design.

One of the most common fail-safe mechanisms can be found in any lawn mower manufactured in the last twenty years, more or less. The “dead-man switch” — that lever you need to hold in place while operating in order to keep the mower running — is a fail-safe feature. If the mower gets out of control, or the user trips and falls, the release of the dead-man switch causes the engine to stop running, hopefully avoiding the circumstance of leaving a “man” actually “dead.”

All systems can fail. In fact, failure mode effects analysis (FMEA) is one of the most important, as well as fascinating, aspects of engineering. Entire departments are dedicated to analyzing and predicting what would happen to the system in the event that any given part, or multiple parts, were to fail in some particular way. The idea, of course, is to design those parts and the overall system such that the failure of one or more parts will result in the system falling into a condition that will not cause further damage, or lead to injury (or worse) to its operators, the public, surrounding equipment and structures, the environment, etc.

So yes, I would argue that the monitors did in fact fail. But they “failed safe.” Which is exactly what they, and the system of which they are a critical part, were designed to do.

Leave a Reply