Edinburgh EPSRC CDT in Machine Learning Systems

About

Machine Learning (ML) and Artificial Intelligence dramatically impact our lives. But ML performance depends on the systems that implements it. Systems research and ML research are symbiotic. Those joining this Centre for Doctoral Training will collaborate and research on the cutting edge across the full ml-systems stack. We are about machine learning methods that work. We build the right methods and the right ways to deploy things to make ML work for real problems.

The School of Informatics and the University of Edinburgh have a long and prestigious history in both Artificial Intelligence and Computer Systems. We have a concentration of research across AI applications such as natural language, vision, robotics, and medicine.

People

The CDT is managed by Stephanie Robin, and is led by Amos Storkey (Director), Michael O'Boyle, Ajitha Rajan, and Luo Mai (co-Directors). It is supported by the Informatics Graduate School.

Prof. Amos Storkey

Director

Prof. Michael O'Boyle

Company Engagement Chair

Prof. Ajitha Rajan

Student Development Chair

Dr. Luo Mai

Director of Training

Stephanie Robin

Manager

Jack Lee

Administrator

CDT students are supervised by staff across the University. All four Directors are supervisors on the CDT, along with many other staff, a non-exhaustive list is provided on the link below.

Supervisors

Example Projects

CDT Programme

The CDT Programme provides a number of training opportunities, as well as the chance to engage on a number of projects. Each student will have a main research project, but these will generally be collaborative with others. There are also mini-projects, hackathons, and company engagement. BonsApps provides training in machine learning artifact creation for targeting business problems. Most students will leverage the opportunity for a paid internship in a company over the course of the PhD. To provide students with the necessary background, the CDT puts on a number of courses for CDT students to take. The course selection is tailored for each PhD student's needs. Each student will follow a research project. Example research projects are given below, but most relevant research projects would be appropriate, so long as it is agreed between student and supervisor. For projects that overlap multiple Edinburgh CDTs, we advise applying to the most relevant one.

Novel Learning Approaches and Multi-Agent Learning for Large Neural Models

The current paradigm for learning networks of every sort are variants of gradient based learning. However such learning is highly inefficient - each gradient step erases significant information learnt in the previous step. Furthermore, such learning processes cope poorly with distributed data and distributed learners. In this project we look beyond current slow gradient methods to new learning approaches that have better theoretical properties than gradient methods, and consider informational transaction between agents that enables much improved ability for each agent to optimize for task. This is particularly valuable for edge-device learning.

ServerlessLLM

Large Language Models (LLMs) demand substantial GPU resources for online platforms, prompting service providers to investigate cost-effective serverless inference architectures. Dynamic workloads are then efficiently consolidated into a shared GPU infrastructure. But this can hit latency due to frequent loading and unloading of models from storage. Our ServerlessLLM is a low-latency serverless inference system tailored for LLMs. It has an LLM checkpoint store and the first live migration algorithm for LLM inference.

Testing Safety of Perception AI on Hardware Accelerators

Autonomous vehicles (AVs) will happen in the near future. Yet, concerns about safety remain to be addressed. This project focuses on assessing safety of perception AI tasks within AV. Perception AI is responsible for detection of vehicles, pedestrians, lanes, traffic light, etc. Such tasks use deep learning, require enormous processing power and rely on hardware accelerators like GPUs and FPGAs. Real-time failure can occur due to incorrect implementation on the hardware accelerators, leading to timing uncertainty, unsafe memory accesses, incorrect data parallelism (see Figure). GPU related bugs are one of the five real faults categories in deep learning tasks like object detection.

EPSRC CDT in Machine Learning Systems, University of Edinburgh