Basic GPU tutorial for GPU newbies, in the context of deep learning.
Enough to understand the basic tradeoffs and main optimization goals.
Taught at various guest lectures in the Technion and beyond.
This is a short version of the Accelerators and Accelerator Computing Systems course that I teach at the Technion.
I taught this course to the ACACES summer school students organized by HiPEAC.
Hardware Trusted Execution Environments (TEEs) available in all modern CPUs aim to guarantee confidentiality and integrity of program execution and data. A unique property of TEEs is their strong threat model that permits an attacker to control an OS or hypervisor, making TEEs a foundation of confidential computing offerings in modern cloud systems.
SpecFuzz is a first tool to enable dynamic testing of Spectre V1 vulnerabilities.
This presentation provides a high-level explanation of the concept of speculation exposure.
This talk surveys the evolution of GPU-native I/O services.
I will first discuss the lessons we learned from our first prototypes for GPUfs (file access from GPUs), GPUnet (streaming network I/O) and GPUrdma (RDMA support), focusing on the main
hardware and software hurdles that made their implementation and use harder than expected.
I will then describe our recent works that strive to overcome the limitations of earlier systems by using new hardware capabilities. I will first discuss the system for the GPU file system access via GPU
memory mapped files. Unlike GPUfs we remove the file system software layer from the GPU, but build on the GPUfs distributed page cache principles to fully integrate the OS page cache into GPU memory with the help of GPU page faults.
Next I will focus on the new opportunities afforded by SmartNICs to improve the performance and efficiency of GPU-accelerated computing services. We develop an accelerator-centric network server
architecture which offloads the server data and control plane to the SmartNIC and enables direct networking from accelerators via a lightweight hardware-friendly I/O mechanism. In addition to freeing the CPU from running network processing and accelerator management as in GPUnet and GPUrdma, we also eliminate the need to run network logic on the GPU, streamlining the integration of network I/O with existing GPUs. Moreover, this architecture easily scales beyond a single machine, enabling convenient network interfaces for remote GPUs. We show experimentally that the use of SmartNICs for GPU-native I/O is portable across accelerators, provides good scalability and can be efficient in different types of SmartNICs. For example, our Mellanox BlueField-based LeNet neural network inference server achieves 300usec request turnaround time and linear scaling with 12 GPUs located in three different servers, and projected to scale linearly to 100 GPUs without using the host CPU.
Future systems will be omni-programmable: alongside CPUs, GPUs, Security accelerators and FPGAs, they will execute user code near-storage, near-network, and near-memory. Ironically, while
breaking power and memory walls via hardware specialization and near data processing, emerging programmability wall will become a key impediment for materializing the promised performance and power efficiency benefits of omni-programmable systems. I argue that the root cause of the programming complexity lies in todays CPU-centric operating system (OS) design which is no longer appropriate for omni-programmable systems.
In this talk I will describe the ongoing efforts in my lab to design an accelerator-centric OS called OmniX, which extends standard OS abstractions into accelerators, while maintaining a coherent view of the system among all the processors. In OmniX, near-data computation accelerators may directly invoke tasks and access I/O services among themselves, excluding the CPU from the
performance-critical data and control plane operations, and turning it into a “yet another” accelerator for sequential computations. I will show how OmniX design principles have been successfully applied to GPUs, Programmable NICs and Intel SGX.
Here is the video for my presentation @ Huawei Systems Software innovation Summit 2021:
Foreshadow is a speculative execution attack on Intel SGX. This talk explains the basic mechanisms of speculative execution attacks and then delves into the details of Foreshadow.
NICA is a SmartNIC-based infrastructure for inline acceleration of network applications. This talk explains the main concepts.
Seamlessly securing applications by running them in Intel SGX is not quite realistic due to performance overheads and hardware side channels. In this talk we argue that there are intermediate points on the security-performance tradeoff curve, which trade some security to achieve better performance and vice versa. We envision that this adjustable security can be enabled by simply recompiling a program with different flags, and show a few ideas how it can be achieved in practice using CosMIX compiler (ATC19).