Fuzzing Away Speculative Execution Attacks
December 2019
December 2019
Intel Israel
Technion Cyber Retreat

SpecFuzz is a first tool to enable dynamic testing of Spectre V1 vulnerabilities.

This presentation provides a high-level explanation of the concept of speculation exposure.

GPU native I/O 2.0 - leveraging new hardware for efficient GPU I/O abstractions
November 2019
NVIDIA HQ, Santa Clara

This talk surveys the evolution of GPU-native I/O services.

I will first discuss the lessons we learned from our first prototypes for GPUfs (file access from GPUs), GPUnet (streaming network I/O)  and GPUrdma (RDMA support), focusing on the main
hardware and software hurdles that made their implementation and use harder than expected.

I will then describe our recent works that strive to overcome the limitations of earlier systems by using new hardware capabilities. I will first discuss the system for the GPU file system access via GPU
memory mapped files. Unlike GPUfs we remove the file system software layer from the GPU, but build on the GPUfs distributed page cache principles to fully integrate the OS page cache into GPU memory with the help of GPU page faults.

Next I will focus on the new opportunities afforded by SmartNICs to improve the performance and efficiency of GPU-accelerated computing services. We develop an accelerator-centric network server
architecture which offloads the server data and control plane to the SmartNIC and enables direct networking from accelerators via a lightweight hardware-friendly I/O mechanism. In addition to freeing the CPU from running  network processing and accelerator management as in GPUnet and GPUrdma, we also eliminate the need to run network logic on the GPU, streamlining the integration of network I/O with existing GPUs. Moreover,  this architecture easily scales beyond a single machine, enabling convenient network interfaces for remote GPUs. We show experimentally that the use of SmartNICs for GPU-native I/O is portable across accelerators, provides good scalability and can be efficient in different types of SmartNICs. For example, our Mellanox BlueField-based  LeNet neural network inference server achieves 300usec request turnaround time and linear scaling with 12 GPUs located in three different servers, and projected to scale linearly to 100 GPUs without using the host CPU.

OmniX - an OS architecture for omni-programmable systems
November 2019
October 2019
September 2019
May 2019
UC Berkeley
TU Dresden

Future systems will be omni-programmable: alongside CPUs, GPUs, Security accelerators and FPGAs, they will execute user code near-storage, near-network, and near-memory.  Ironically, while
breaking power and memory walls via hardware specialization and near data processing, emerging programmability wall will become a key impediment for materializing  the promised performance and power efficiency benefits of omni-programmable systems. I argue that the root cause of the programming complexity lies in todays CPU-centric operating system (OS) design which  is no longer appropriate for omni-programmable systems.

In this talk I will describe the ongoing efforts in my lab to design an accelerator-centric OS called OmniX, which  extends standard OS abstractions into accelerators,  while maintaining  a coherent view of the system among all the processors. In OmniX, near-data computation accelerators may directly invoke tasks and access I/O services among themselves, excluding the CPU from the
performance-critical data and control plane operations, and turning it into a “yet another” accelerator for sequential computations. I will show how OmniX design principles have been successfully applied to GPUs, Programmable NICs and Intel SGX.

Foreshadow attack explained
May 2019
March 2019
Technion Cyber Day

Foreshadow is a speculative execution attack on Intel SGX. This talk explains the basic mechanisms of speculative execution attacks and then delves into the details of Foreshadow.

Accelerating Network Application on SmartNICs with NICA
October 2018
Jun 2018
IBM Research

NICA is a SmartNIC-based infrastructure for inline acceleration of network applications. This talk explains the main concepts.

Zero-effort adaptable security
March 2018

Seamlessly securing applications by running them in Intel SGX is not quite realistic due to performance overheads and hardware side channels. In this talk we argue that there are intermediate points on the security-performance tradeoff curve, which trade some security to achieve better performance and vice versa. We envision that this adjustable security can be enabled by simply recompiling a program with different flags, and show a few ideas how it can be achieved in practice using CosMIX compiler (ATC19).