Software Systems Reliability and Resilience
Software systems reliability and resilience are important properties of software systems. Software reliability (SRE) has been defined as the probability of failure-free software operation for a specified period of time in a specific environment. Probably one one the most important qualities of software systems as it can make a system inoperative.
SRE includes
- Software reliability measurement – estimation and prediction.
- Attributes and metrics of software design, development process, architecture and their impact on reliability.
- Usage of the acquired knowledge to guide the design of software systems and development processes.
The SRE Process
STRAIT
We deal with ways to model and predict software failure helping to build more reliable software systems. One of the main contribution is the STRAIT tool to build and integrate software reliability growth models (SRGM) to model cumulative failures of software systems. SRGMs are models that describe in mathematical form the pattern of fault detection and removal and can be used to predict the future failures of a software system based on the past history.
Publications
- Chren, S., Micko, R., Buhnova, B., & Rossi, B. (2019). STRAIT: A tool for automated software reliability growth analysis. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) (pp. 105-110). IEEE. https://ieeexplore.ieee.org/abstract/document/8816793
- Mičko, R., Chren, S., & Rossi, B. (2022). Applicability of software reliability growth models to open source software. In 2022 48th Euromicro Conference on Software Engineering and Advanced Applications (SEAA) (pp. 255-262). IEEE. https://ieeexplore.ieee.org/abstract/document/10011522
Antifragility
Antifragility is an improved resilience approach introduced in 2012 by Professor Nassim Nicholas Taleb in his book entitled: “Antifragile: Things That Gain from Disorder”. The idea behind Antifragility is to appreciate some level of stressors and perturbation and actively “employ” them to get better performance over a longer time horizon.
Our vision is to integrate antifragile mechanisms into critical infrastructure systems to improve the resistance of software systems to failures:
- Seek to inject volatility in critical infrastructure systems to expose their fragilities.
- Enable critical infrastructure systems to take autonomous decisions to move from stable to unstable conditions.
- Enable critical infrastructure systems to survive shocks and dynamically determine their fragilities.
- Use the fragilities of critical infrastructure systems to learn by doing how to thrive as threats vary.
- Exploit Artificial Intelligence (AI) to support the adaptability and evolution of critical infrastructure systems.
- Go beyond the traditional target of resilience by proactively self-adapting and self-organizing the behavior of critical infrastructure systems to changing conditions.
Proposed Architecture
To reach 4-S properties: Self-configuring, Self-Healing, Self-Optimization, Self-Protection. The proposed architecture is based on the Akka framework for managing the system according to a MAPE-k cycle, a simulation environment based on discrete event simulation, and a chaos engineering integration to check the system for unknown unknowns.
Publications
- Bangui Hind, Barbora Buhnova, and Bruno Rossi. Shifting Towards Antifragile Critical Infrastructure Systems. Proceedings of the 7th International Conference on Internet of Things, Big Data and Security (IoTBDS). 2022.
- Bangui Hind, Bruno Rossi, and Barbora Buhnova. A Conceptual Antifragile Microservice Framework for Reshaping Critical Infrastructures. Proceedings of the 38th IEEE International Conference on Software Maintenance and Evolution (ICSME). 2022.
- MRÁZ, Marcel; Hind BANGUI; Bruno ROSSI and Barbora BÜHNOVÁ. Adopting the Actor Model for Antifragile Serverless Architectures. Online. In Proceedings of the 18th International Conference on Software Technologies - ICSOFT. Neuveden: SciTePress, 2023.