10 March, 2014

3-3-12 improving systems: Normal accidents

Problems can (and will inevitably) happen at any step in the system especially when the system is tightly-coupled and interdependent

The longer any system runs; the more risks accumulate and the more probable that problems happen

-          Things go wrong all the time, and will go wrong
-          The best strategy towards this is
-          Be prepared psychologically to accidents
-          Always run quality tests
-          Have experts prepared and ready to prevent and deal with accidents
-          Watch and learn from previous accidents to prevent more accidents
-          Analyze close calls to figure out what may cause problems
-            Stress testing (simulate failure in the system to learn from it)

Notice that you don't want to add more to the system in order to make it better; the simpler the system is, the less complicated it is the better