Device-Centric Server Architectures

Members: Dongup Kwon, Pyeongsu Park, Eunjin Baek, Heetaek Jeong, Wonsik Lee, Junehyuk Boo, Kanghyun Choi, Eunbok Lee, Seongmin Na, Dongryeong Kim, Yujin Chung, Bogyeong Park, Jinha Jeong, Wonseok Lee

Motivation

In this project, we design and prototype innovative server architectures to satisfy the need of emerging server applications (e.g., data analytics, machine learning, virtual machine).

Research

Device orchestration [MICRO’15, ISCA’18].  Modern high-performance servers aim to achieve high performance by exploiting a large number of high-performance devices (e.g., NVMe SSDs, 100GbE NICs, GPUs, AI accelerators) installed in a single server. Such examples include datacenter servers, SSD-array storage systems, and accelerator-array AI training servers. But, such servers suffer from the host-CPU’s high bandwidth requirement and device-control latency for orchestrating a large number of fast devices simultaneously. Our DCS architecture effectively resolves the problem by introducing DCS-Engine, a fast and scalable FPGA-based device orchestration device. On behalf of the host-side CPU and OS, DCS Engine manages all data and control paths among devices at the hardware level. In this way, a DCS server can significantly reduce the latency of inter-device communications, while bypassing the host CPUs mostly. In addition, by providing near-device processing (NDP) units on DCS Engine, a DCS server can further improve the overall performance and scalability.

Storage server [HPCA’19, MICRO’19]. Modern storage servers aim to achieve high performance by deploying a number of SSDs on a single server. But, their performance, scalability, and cost-effectiveness highly depend on the cost of host CPUs and the data storage devices’ data reduction rate. To resolve the issues, we build CIDR and FIDR which are two DCS-inspired, fast, and scalable SSD-array supporting storage server architectures. Our DCS-inspired storage servers (1) offload the NDP-friendly operations from the host CPUs to FPGAs and (2) have them cooperatively manage the storage devices and deduplicate the data.

AI training server [MICRO’20]. Modern AI training servers aim to achieve high training performance by deploying as many CPUs, GPUs, accelerators, and storage devices on a single server as possible. In this work, we first identify the need of architecting a new AI training sever to provide an extreme-scale scalability, and then introduce the challenge in systemically balancing heterogeneous operations among the devices. We resolve the issue by introducing TrainBox, our DCS-inspired highly-scale training server architecture which can systemically balance the training operations among the devices.

Virtual machine emulation [OSDI’20, ATC’21, TOS’22]. Modern servers must support virtualization to servers as many as users. However, modern software-based virtualization requires too much CPU bandwidth, whereas modern hardware-based virtualization cannot provide flexible virtualization services. We resolve the issue by introducing FVM, our DCS-inspired fast and flexible virtualization server architecture which exploits an FPGA to provide a fast and flexible virtualization capability.

Network Acceleration [ISCA’23]. As complex workloads that run on many servers are pursuing higher networking throughput, more CPU cycles are consumed to support the TCP stack. However, none of the prior approaches satisfy all of the critical requirements of TCP simultaneously, which are high performance, many connections, and high flexibility. In this work, we design F4T (1) to process stateful operations back-to-back without stalls and (2) efficiently manage the TCP states among multiple memory modules. F4T also provides a full SW-HW stack that allows applications to easily utilize F4T without application modifications.

Interconnect. Modern servers are benefiting from fast devices, but both their device-to-device and node-to-node communication severely limit the servers’ scalability. To address the issue, many inter-device (e.g., PCIe, NVLink, CXL, CCIX) and inter-node (e.g., InfiniBand, RoCE, Gen-Z) interconnections methods have been introduced. We are currently working on improving our server solutions’ scale-up and scale-out capability by taking the best advantages of the emerging interconnection methods.

Software release

  • DCS: To be released soon.

Publications

  • F4T: A Fast and Flexible FPGA-based Full-stack TCP Acceleration Framework
    Junehyuk Boo, Yujin Chung, Eunjin Baek, Seongmin Na, Changsu Kim, and Jangwoo Kim
    ACM/IEEE International Symposium on Computer Architecture (ISCA), Jun. 2023
  • SmartFVM: A Fast, Flexible, and Scalable Hardware-based Virtualization for Commodity Storage Devices
    Dongup Kwon*, Wonsik Lee*, Dongryeong Kim, Junehyuk Boo, and Jangwoo Kim
    ACM Transactions on Storage (TOS), 2022
  • A Fast and Flexible Hardware-based Virtualization Mechanism for Computational Storage Devices
    Dongup Kwon, Dongryeong Kim, Junehyuk Boo, Wonsik Lee, and Jangwoo Kim
    USENIX Annual Technical Conference (ATC), Jun. 2021
  • FVM: FPGA-assisted Virtual Device Emulation for Fast, Scalable, and Flexible Storage Virtualization
    Dongup Kwon, Junehyuk Boo, Dongryeong Kim, and Jangwoo Kim
    USENIX Symposium on Operating Systems Design and Implementation (OSDI), Nov. 2020
  • TrainBox: An Extreme-Scale Neural Network Training Server by Systematically Balancing Operations
    Pyeongsu Park, Heetaek Jeong, and Jangwoo Kim
    IEEE/ACM International Symposium on Microarchitecture (MICRO), Oct. 2020
  • FIDR: A Scalable Storage System for Fine-Grain Inline Data Reduction with Efficient Memory Handling
    Mohammadamin Ajdari, Wonsik Lee, Pyeongsu Park, Joonsung Kim, and Jangwoo Kim
    IEEE/ACM International Symposium on Microarchitecture (MICRO), Oct. 2019
  • CIDR: A Cost-Effective In-line Data Reduction System for Terabit-per-Second Scale SSD Arrays
    Mohammadamin Ajdari, Pyeongsu Park, Joonsung Kim, Dongup Kwon, and Jangwoo Kim
    IEEE International Symposium on High-Performance Computer Architecture (HPCA), Feb. 2019
  • DCS-ctrl: A Fast and Flexible Device-Control Mechanism for Device-Centric Server Architecture
    Dongup Kwon*, Jaehyung Ahn*, Dongju Chae, Mohammadamin Ajdari, Jaewon Lee, Suheon Bae, Youngsok Kim, and Jangwoo Kim
    ACM/IEEE International Symposium on Computer Architecture (ISCA), Jun. 2018
  • DCS: A Fast and Scalable Device-Centric Server Architecture
    Jaehyung Ahn*, Dongup Kwon*, Youngsok Kim, Mohammadamin Ajdari, Jaewon Lee, and Jangwoo Kim
    ACM/IEEE International Symposium on Microarchitecture (MICRO), Dec. 2015

* Contributed equally