Wednesday, July 28, 2010

Robustness and resilience of large distributed applications and networks

In the area of clouds and large distributed automation and control networks, we need to deal with a vast number of (growing) endpoints integrated in the (dynamic) system. It is probably a misconception to assume that all these peers could be protected comprehensively at any time. Hence, it must be an important objective that the protection of the entire system must not depend on the security status (pertaining integrity, confidentiality and availability) of each and every endpoint. In other words, a compromised node must not affect or infect the stability and protection of the entire distributed system. This shall be adressed in the system and security architecture and needs to be defined (and tested !) as a crucial requirement. A (layered) defense in depth, as a general design principle, can help to meet this requirement. In addition, intrusion detection, prevention and a quick isolation of the compromised node can help to minimize the overall impact. Plan for failure is the underlying principle to implement this efficiently. Beside these classical security precautions and controls, a robust design as well as adequate redandancy mechanisms for critical subsystems can support the system stability.