EPSRC CDT in Machine Learning Systems, University of Edinburgh

Learn More


Machine Learning (ML) and Artificial Intelligence dramatically impact our lives. But ML performance depends on the systems that implements it. Systems research and ML research are symbiotic. Those joining this Centre for Doctoral Training will collaborate and research on the cutting edge across the full ml-systems stack. We are about machine learning methods that work. We build the right methods and the right ways to deploy things to make ML work for real problems.

The School of Informatics and the University of Edinburgh have a long and prestigious history in both Artificial Intelligence and Computer Systems. We have a concentration of research across AI applications such as natural language, vision, robotics, and medicine.



The CDT is managed by Stephanie Robin, and is led by Amos Storkey (Director), Michael O'Boyle, Ajitha Rajan, and Luo Mai (co-Directors). It is supported by the Informatics Graduate School.

Michael O'Boyle

Prof. Michael O'Boyle

Company Engagement Chair

Ajitha Rajan

Dr. Ajitha Rajan

Student Development Chair

Luo Mai

Dr. Luo Mai

Director of Training

CDT students are supervised by staff across the University. All four Directors are supervisors on the CDT, along with many other staff, a non-exhaustive list is provided on the link below.

Example Projects


Novel Learning Approaches and Multi-Agent Learning for Large Neural Models

The current paradigm for learning networks of every sort are variants of gradient based learning. However such learning is highly inefficient - each gradient step erases significant information learnt in the previous step. Furthermore, such learning processes cope poorly with distributed data and distributed learners. In this project we look beyond current slow gradient methods to new learning approaches that have better theoretical properties than gradient methods, and consider informational transaction between agents that enables much improved ability for each agent to optimize for task. This is particularly valuable for edge-device learning.



Large Language Models (LLMs) demand substantial GPU resources for online platforms, prompting service providers to investigate cost-effective serverless inference architectures. Dynamic workloads are then efficiently consolidated into a shared GPU infrastructure. But this can hit latency due to frequent loading and unloading of models from storage. Our ServerlessLLM is a low-latency serverless inference system tailored for LLMs. It has an LLM checkpoint store and the first live migration algorithm for LLM inference.


Testing Safety of Perception AI on Hardware Accelerators

Autonomous vehicles (AVs) will happen in the near future. Yet, concerns about safety remain to be addressed. This project focuses on assessing safety of perception AI tasks within AV. Perception AI is responsible for detection of vehicles, pedestrians, lanes, traffic light, etc. Such tasks use deep learning, require enormous processing power and rely on hardware accelerators like GPUs and FPGAs. Real-time failure can occur due to incorrect implementation on the hardware accelerators, leading to timing uncertainty, unsafe memory accesses, incorrect data parallelism (see Figure). GPU related bugs are one of the five real faults categories in deep learning tasks like object detection.

Apply to join the ML systems CDT


Informatics Forum, 10 Crichton Street, Edinburgh EH10 4LD.