The list of possible topics for the current academic year are the following:
The availability of GPUs is becoming really critical for scientific research. Giovanni Mirarchi, a newly graduated student at POLITO, carried out an extensive study about the possibility to integrate Ray to support GPU-oriented jobs with an HPC-based approach. At the end of his thesis, he produced a working proof of concept, installed and operating on a dedicated K3s cluster. This project aims at porting the above solution onto CrownLabs, creating all the configuration files (e.g., YAML, manifests, configmaps, etc.) required for this system to be used by students in the CrownLabs production cluster. This system is intended to be used from within a python program, e.g., delivered through a Jupiter Notebook; hence, this work includes also the creation of an appropriate container, such as a Jupiter Notebook, to be made available to students who have to use GPU resources.
Optionally, this project should also investigate how Kueue, or alternative solutions, can be integrated in the workflow, which could help to set the proper queuing policies in case the number of jobs submitted exceeds the number of available GPUs. Alternatively, the project may experiment with the NVIDIA KAI Scheduler, evaluating its effectiveness and integration potential within the same workflow, as a possible complement or replacement for Kueue in managing GPU allocation and scheduling.
Technologies: Kubernetes, Docker, Helm, ArgoCD, Git.
Student: Salvo P. (s354138); tutor(s): Attilio Oliva, Marco Miracapillo
Kubernetes, and many associated tools, export a large number of information that are used for continuous monitoring and troubleshooting. However, information about physical servers cannot be retrieved within Kubernetes, although this data is available from within the BMC interface , available on all servers for offline management.
The project https://github.com/mrlhansen/idrac_exporter is a nice example of a Kubernetes software that can interface with the BMC interface of the servers, and retrieve information from there. For example, it can know the state of the hardware of each server, the energy consumed by the server itself, etc.
This project aims at deploying this project on the CrownLabs cluster, with the proper configuration (using ArgoCD), and push relevant information in Prometheus, assessing the capabilities of this software (e.g., to support multiple BMC interfaces, such as iLO, iDrac, and others) and its readiness to be used in production (e.g., in terms of quality of the exported data).
Finally, this project will create one or more Grafana Dashboards to show the information gathered from the above software and make them easily readable from external users.
Technologies: Kubernetes, Docker, Helm, ArgoCD, IPMI, Grafana.
Student: Lorenzo D. (s349496); tutor(s): Stefano Galantino, Marco Miracapillo, Attilio Oliva
Giovanni Mirarchi, a newly graduated student at POLITO, created a software to book physical resources, such as bare metal servers. In short, this tool includes both an online reservation component and the server provisioning mechanism itself: when a user requests to use a specific server, all the data are wiped out and a clean version of the operating system is installed. The problem we would ask you to solve is to extend this system in the following way:
extend the graphical interface to allow the user to choose among a set of operating systems to be installed;
extend the graphical interface to allow the user to choose an .ISO file to use to install its own preferred operating system (which overcomes the case in which the desired operating system is not provided by default within the PROGNOSE platform);
extend the graphical interface to allow users to upload an arbitrary number of security keys (for SSH access);
extend the Kubernetes-based backend to perform the indicated actions.
Technologies: Kubernetes, Docker, Helm, ArgoCD, IPMI.
Student: Michele S. (s304046); tutor: Attilio Oliva, Stefano Galantino
Liqo is an open-source project, started at Politecnico di Torino, which recently originated a spin-off company, ArubaKube. Liqo enables multiple Kubernetes clusters to share their own resources, which is leveraged by our University to handle peaks of demands such as during exams (see the blog post).
Currently, the Liqo project has been used as the foundation for the Eu-based FLUIDOS project, which is currently developing extensions such as the REAR protocol (https://github.com/topix-hackademy/Rear/).
The Liqo Dashboard (live example - source code repository) is a small project that enables the visualization of some information about the peered clusters, such as the resources shared and/or consumed.
This project aims at extending the Liqo dashboard to simplify (and enable) the peering/un-peering directly from the web, e.g., by enabling the dynamic creation of a "token" that can be used to peer to the given cluster, albeit with limited resources. This should be integrated within the FLUIDOS architecture, which enables users to advertise a specific set of resources.
Optionally, the peering should be torn down after some time (e.g., days), in order to demonstrate the advantages of the technology, while reducing the risks of mis-use of the above technology.
Technologies: Kubernetes, Go, Javascript, React, Git
Student: XXX; tutor: YYY