Software Systems Reliability and Resilience

Software systems reliability and resilience are important properties of software systems. Software reliability (SRE) has been defined as the probability of failure-free software operation for a specified period of time in a specific environment. Probably one one the most important qualities of software systems as it can make a system inoperative.

SRE includes

  1.  Software reliability measurement – estimation and prediction.
  2. Attributes and metrics of software design, development process, architecture and their impact on reliability.
  3. Usage of the acquired knowledge to guide the design of software systems and development processes.

The SRE Process

No description

STRAIT

We deal with ways to model and predict software failure helping to build more reliable software systems.  One of the main contribution is the STRAIT tool to build and integrate software reliability growth models (SRGM) to model cumulative failures of software systems. SRGMs are  models that describe in mathematical form the pattern of fault detection and removal and can be used to predict the future failures of a software system based on the past history.

Publications

No description

Antifragility

Antifragility is an improved resilience approach introduced in 2012 by Professor Nassim Nicholas Taleb in his book entitled: “Antifragile: Things That Gain from Disorder”. The idea behind Antifragility is to appreciate some level of stressors and perturbation and actively “employ” them to get better performance over a longer time horizon.

Our vision is to integrate antifragile mechanisms into critical infrastructure systems to improve the resistance of software systems to failures:

  • Seek to inject volatility in critical infrastructure systems to expose their fragilities.
  • Enable critical infrastructure systems to take autonomous decisions to move from stable to unstable conditions.
  • Enable critical infrastructure systems to survive shocks and dynamically determine their fragilities.
  • Use the fragilities of critical infrastructure systems to learn by doing how to thrive as threats vary.
  • Exploit Artificial Intelligence (AI) to support the adaptability and evolution of critical infrastructure systems.
  • Go beyond the traditional target of resilience by proactively self-adapting and self-organizing the behavior of critical infrastructure systems to changing conditions.

Proposed Architecture

To reach 4-S properties: Self-configuring, Self-Healing, Self-Optimization, Self-Protection. The proposed architecture is based on the Akka framework for managing the system according to a MAPE-k cycle, a simulation environment based on discrete event simulation, and a chaos engineering integration to check the system for unknown unknowns.

No description

Publications

Team members

Stanislav Chren

You are running an old browser version. We recommend updating your browser to its latest version.

More info