Fakultät für Elektrotechnik und Informationstechnik: Freie Stellen

Wir sind immer auf der Suche nach hochmotivierten Studenten, die unser Team in den Bereichen Approximate Computing, Machine Learning, Design Automation for Emerging Technologies, Reliability und Fault-Tolerance verstärken.

Summer 2025 internships

The Duration is June to December 2025. The application deadline is Feb. 28th, 2025. Interested students should send their applications to Behnaz Ranjbar.

Prototype Development for High-Quality Photography in Complex Environments

High-quality photography is critical in medical applications, particularly in environments with limited space, poor lighting, and reflective surfaces. This project focuses on the development of an embedded camera system designed to standardize and simplify the process of capturing consistent, high-quality images for reliable documentation and analysis, even under challenging conditions. The goal is to deliver a functional embedded prototype that is technically robust, efficient, and user-friendly, suitable for evaluating real-world applications.

The role will involve concept development, hardware-software co-design, and embedded system implementation, including the integration of sensors, image processing algorithms, and control systems. The goal is to deliver a functional embedded prototype that is technically robust, energy-efficient, and user-friendly, suitable for real-world applications in medical and other demanding environments. Key tasks include:

Designing and optimizing embedded hardware for camera control and image acquisition.
Implementing real-time image enhancement and processing algorithms on resource-constrained platforms.
Integrating sensors, lighting modules, and control systems for adaptive image capture.
Ensuring energy efficiency, low-latency operation, and robustness in medical and other demanding environments.

Contact information for more details: Prof. Akash Kumar

Approximation of Machine Learning Models for High-Throughput, Energy-Efficient, and Sustainable Computing in 5G/6G Era

To improve the energy consumption and/or the response time of ML applications, various computing approaches have emerged in the era of 5G/6G. These approaches include Federated Learning, Distributed Inference, In-Network Computing, etc. However, to enable the execution of many cutting-edge and compute-intensive models (e.g., LLMs and DNNs) for the resource-constrained devices in the edge-to-cloud continuum, the structure of such models should be optimized without compromising the final quality of results. In this context, Approximate Computing techniques have been shown to provide highly beneficial solutions by exploiting the inherent error resiliency of ML models. Considering such potentials, the main idea in this project is to find and apply a combination of suitable approximation techniques that can reduce the area/power/energy of ML models and boost their performance while satisfying the accuracy requirement of the users.

Required skills

FPGA development and programming: Verilog or VHDL, C++, and Python
High-Level-Synthesis: Vivado and Vitis HLS
ML: Tensorflow and/or PyTorch, able to change the structure of ML modes (NNs, LLMs, etc.) by applying techniques such as layer-wise quantization and pruning.

Contact information for more details: Zahra Ebrahimi

Employing Reinforcement Learning to Design FPGA-optimized Approximate Operators

The run-time reconfigurability and high parallelism offered by FPGAs make them an attractive choice for implementing hardware accelerators for ML algorithms. In the quest for designing efficient FPGA-based hardware accelerators for ML algorithms, the inherent error-resilience of ML algorithms can be exploited to implement approximate hardware accelerators to trade the output accuracy with better overall performance. As multiplication and addition are the two main arithmetic operations in ML algorithms, most state-of-the-art approximate accelerators have considered approximate architectures for these operations. However, these works have mainly considered the exploration and selection of approximate operators from an existing set of operators. To this end, this project focuses on designing a reinforcement learning (RL)-based framework for synthesizing and implementing novel approximate operators. RL is a type of machine learning where an agent learns to perform actions in an environment to maximize a reward signal. RL-based techniques would help achieve approximate operators with better accuracy-performance trade-offs in this project.

Pre-requisites:
- Digital Design, FPGA-based accelerator design
- Python, TCL
- Some knowledge of ML algorithms
Skills that will be acquired during project work:
- ML for EDA
- Multi-objective optimization of hardware accelerators.
- Technical writing for research publications.
Related Publications:
- S. Ullah, S. S. Sahoo, and A. Kumar. "CoOAx: Correlation-aware Synthesis of FPGA-based Approximate Operators." Proceedings of the Great Lakes Symposium on VLSI 2023. 2023.
- S. Ullah, S. S. Sahoo, N. Ahmed, D. Chaudhury, and A. Kumar "AppAxO: Designing App lication-specific Approximate Operators for FPGA-based Embedded Systems." ACM Transactions on Embedded Computing Systems (TECS) 21.3 (2022): 1-31.
- S. Ullah, S. S. Sahoo, A. Kumar, "CLAppED: A Design Framework for Implementing Cross-Layer Approximation in FPGA-based Embedded Systems", In Proceeding: 2021 58th ACM/IEEE Design Automation Conference (DAC), pp. 1-6, Jul 2021.
Contact: Salim Ullah

Machine-Learning Techniques Analysis for Embedded Real-Time System Design

In general, there are three categories of ML techniques -- supervised-learning, unsupervised-learning, and reinforcement-learning -- where depending on the problem, parameters, and inputs, only some of these techniques are suitable and used for system properties optimization. These ML techniques are memory-intensive and computationally expensive, which makes some of them incompatible with real-time system design due to the overheads, which may cause an effect on applications' timeliness. Therefore, this project aims to analyze and investigate various ML techniques in terms of overheads, accuracy, and capability and determine the efficient ones suitable for embedded real-time systems.

Pre-Requisites
- Proficiency in C++, Python, Matlab
- Knowledge about Machine Learning techniques
- Good knowledge of computer architecture and algorithm design
Related Publications:
- S. Pagani, P. D. S. Manoj, A. Jantsch and J. Henkel, "Machine Learning for Power, Energy, and Thermal Management on Multicore Processors: A Survey," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 39, no. 1, pp. 101-116, 2020.
Contact
- Behnaz Ranjbar

Extending ML Hardware Generators with Approximate Operators

Various open-source tools, such as HLS4ML, FINN, and Tensil AI, enable fast FPGA
implementation of machine learning algorithms. In the quest for designing efficient FPGA-
based hardware accelerators for ML algorithms, the inherent error-resilience of ML
algorithms can be exploited to implement approximate hardware accelerators to trade the
output accuracy with better overall performance. However, the available hardware
generators always generate implementations that employ accurate arithmetic operators.
This short project focuses on extending these frameworks to consider approximate
arithmetic operators for various operations in the generated RTL of the algorithm. In this
regard, our chair offers extended libraries of FPGA-optimized approximate operators, which
would be utilized in the project.

Pre-requisites:
- Digital Design, FPGA-based accelerator design, High-level Synthesis
- Python, C++
Skills that will be acquired during project work:
- Hardware design for ML
- System-level design
- Technical writing for research publications.
Related Publications:
- S. Ullah, S. S. Sahoo, N. Ahmed, D. Chaudhury, and A. Kumar "AppAxO:
  Designing App lication-specific Approximate Operators for FPGA-based
  Embedded Systems." ACM Transactions on Embedded Computing Systems
  (TECS) 21.3 (2022): 1-31.
Contact: Salim Ullah

FPGA-based Cycle Accurate Emulator for Exploring the Emerging Non-volatile Memories

The commonly utilized Dynamic Random-access Memory (DRAM)- and Static Random-
access Memory (SRAM)-based solutions lag in satisfying the memory requirements—
capacity, latency, and energy—of modern applications and computing systems. The
emerging Non-volatile Memory (NVM) technologies, such as Spin Transfer Torque Random
Access Memory (STT-RAM), Phase-change Random-access Memory (PCRAM), Resistive
Random-access Memory (ReRAM) and Racetrack Memory (RTM), offer a promising solution
to overcome these bottlenecks. The NVMs offer better density and energy efficiency than the
SRAM- and DRAM-based technologies. However, the NVMs have some limitations, such as
variable access latency and costly write operations, which result in limited memory
performance improvement. Furthermore, the NVM technologies are still in their exploratory
phase compared to the SRAM technology. These challenges of NVMs open a research space
for exploring various hardware and software architectures to overcome these challenges of
NVMs and enable hybrid technologies-based memory hierarchies to improve memory
systems’ performance.
In this work, we will focus on implementing an FPGA-based cycle-accurate emulator that can
enable the quick analysis of various NVM-based caches on the overall performance of a
computing system. The developed emulator would be evaluated for various benchmark
applications.

Pre-requisites:
- Digital Design, Computer Architecture
- Knowledge of RISC-V architecture
- Experience with Xilinx Vivado, VHDL/Verilog
- Some scripting language (preferably Python), C++
Skills that will be acquired during project work:
- RISC-V-based System Design
- Knowledge about NVMs and caches
- System-level design and performance analysis
- FPGA Design tools
- Technical writing for research publications
Contact: Salim Ullah

Implementing Application-Specific Approximate Computing for RISC-V

Implementing Application-Specific Approximate Computing for RISC-V
This project explores the integration of approximate computing techniques into RISC-V
architectures to optimize performance, energy efficiency, and resource utilization for
specific applications. By selectively reducing computational accuracy where full precision
is unnecessary, the project aims to achieve significant power and speed improvements
while maintaining acceptable output quality. The research involves designing and
implementing hardware or software-based approximation methods, evaluating their impact
on different workloads, and developing strategies for balancing efficiency with correctness
in real-world scenarios.

Pre-requisites:
- Digital Design, Computer Architecture
- Knowledge of RISC-V architecture
- Experience with Xilinx Vivado, VHDL/Verilog
- Some scripting language (preferably Python), C++
- Simulation & Performance Evaluation – Working with tools like Gem5, Spike,
  QEMU, or RISC-V simulators.
Skills that will be acquired during project work:
- RISC-V-based System Design & RISC-V Processor Customization
- Hardware-Software Co-Design
- Power and Performance Analysis
- Technical writing for research publications
Contact: Salim Ullah

Architectural Trade-Offs in Chiplet-Based DNN Systems: Partitioning and Approximation Perspective

This project investigates architectural trade-offs in chiplet-based deep neural network (DNN) systems, with a particular focus on functional partitioning and approximate computing. The main goal is to establish a design space exploration environment using the Gemini framework, which enables mapping and architecture co-exploration for large-scale DNN accelerators. The project involves implementing different chiplet architectures within Gemini, exploring various partitioning granularities, and analyzing their impact on performance metrics such as latency, throughput, and energy efficiency. Additionally, the integration of approximate architectures into chiplets will be explored to evaluate accuracy-efficiency trade-offs. The study aims to provide insights into scalable and energy-efficient design strategies for modular DNN acceleration.

Contact: Salim Ullah (salim.ullah@rub.de)
Reference Article: "Gemini: Mapping and Architecture Co-exploration for Large-scale DNN Chiplet Accelerators"