Device-Centric Server Architectures

Members: Dongup Kwon, Pyeongsu Park, Eunjin Baek, Heetaek Jeong, Wonsik Lee, Junehyuk Boo, Kanghyun Choi, Eunbok Lee, Seongmin Na, Dongryeong Kim, Yujin Chung, Bogyeong Park, Jinha Jeong, Wonseok Lee

Motivation

In this project, we design and prototype innovative server architectures to satisfy the need of emerging server applications (e.g., data analytics, machine learning, virtual machine).

Research

Device orchestration [MICRO’15, ISCA’18]. Modern high-performance servers aim to achieve high performance by exploiting a large number of high-performance devices (e.g., NVMe SSDs, 100GbE NICs, GPUs, AI accelerators) installed in a single server. Such examples include datacenter servers, SSD-array storage systems, and accelerator-array AI training servers. But, such servers suffer from the host-CPU’s high bandwidth requirement and device-control latency for orchestrating a large number of fast devices simultaneously. Our DCS architecture effectively resolves the problem by introducing DCS-Engine, a fast and scalable FPGA-based device orchestration device. On behalf of the host-side CPU and OS, DCS Engine manages all data and control paths among devices at the hardware level. In this way, a DCS server can significantly reduce the latency of inter-device communications, while bypassing the host CPUs mostly. In addition, by providing near-device processing (NDP) units on DCS Engine, a DCS server can further improve the overall performance and scalability.

Storage server [HPCA’19, MICRO’19]. Modern storage servers aim to achieve high performance by deploying a number of SSDs on a single server. But, their performance, scalability, and cost-effectiveness highly depend on the cost of host CPUs and the data storage devices’ data reduction rate. To resolve the issues, we build CIDR and FIDR which are two DCS-inspired, fast, and scalable SSD-array supporting storage server architectures. Our DCS-inspired storage servers (1) offload the NDP-friendly operations from the host CPUs to FPGAs and (2) have them cooperatively manage the storage devices and deduplicate the data.

AI training server [MICRO’20]. Modern AI training servers aim to achieve high training performance by deploying as many CPUs, GPUs, accelerators, and storage devices on a single server as possible. In this work, we first identify the need of architecting a new AI training sever to provide an extreme-scale scalability, and then introduce the challenge in systemically balancing heterogeneous operations among the devices. We resolve the issue by introducing TrainBox, our DCS-inspired highly-scale training server architecture which can systemically balance the training operations among the devices.

Virtual machine emulation [OSDI’20, ATC’21, TOS’22]. Modern servers must support virtualization to servers as many as users. However, modern software-based virtualization requires too much CPU bandwidth, whereas modern hardware-based virtualization cannot provide flexible virtualization services. We resolve the issue by introducing FVM, our DCS-inspired fast and flexible virtualization server architecture which exploits an FPGA to provide a fast and flexible virtualization capability.

Network Acceleration [ISCA’23]. As complex workloads that run on many servers are pursuing higher networking throughput, more CPU cycles are consumed to support the TCP stack. However, none of the prior approaches satisfy all of the critical requirements of TCP simultaneously, which are high performance, many connections, and high flexibility. In this work, we design F4T (1) to process stateful operations back-to-back without stalls and (2) efficiently manage the TCP states among multiple memory modules. F4T also provides a full SW-HW stack that allows applications to easily utilize F4T without application modifications.

Interconnect. Modern servers are benefiting from fast devices, but both their device-to-device and node-to-node communication severely limit the servers’ scalability. To address the issue, many inter-device (e.g., PCIe, NVLink, CXL, CCIX) and inter-node (e.g., InfiniBand, RoCE, Gen-Z) interconnections methods have been introduced. We are currently working on improving our server solutions’ scale-up and scale-out capability by taking the best advantages of the emerging interconnection methods.

Software release

DCS: To be released soon.

Publications

F4T: A Fast and Flexible FPGA-based Full-stack TCP Acceleration Framework
Junehyuk Boo, Yujin Chung, Eunjin Baek, Seongmin Na, Changsu Kim, and Jangwoo Kim
ACM/IEEE International Symposium on Computer Architecture (ISCA), Jun. 2023
SmartFVM: A Fast, Flexible, and Scalable Hardware-based Virtualization for Commodity Storage Devices
Dongup Kwon*, Wonsik Lee*, Dongryeong Kim, Junehyuk Boo, and Jangwoo Kim
ACM Transactions on Storage (TOS), 2022
A Fast and Flexible Hardware-based Virtualization Mechanism for Computational Storage Devices
Dongup Kwon, Dongryeong Kim, Junehyuk Boo, Wonsik Lee, and Jangwoo Kim
USENIX Annual Technical Conference (ATC), Jun. 2021
FVM: FPGA-assisted Virtual Device Emulation for Fast, Scalable, and Flexible Storage Virtualization
Dongup Kwon, Junehyuk Boo, Dongryeong Kim, and Jangwoo Kim
USENIX Symposium on Operating Systems Design and Implementation (OSDI), Nov. 2020
TrainBox: An Extreme-Scale Neural Network Training Server by Systematically Balancing Operations
Pyeongsu Park, Heetaek Jeong, and Jangwoo Kim
IEEE/ACM International Symposium on Microarchitecture (MICRO), Oct. 2020
FIDR: A Scalable Storage System for Fine-Grain Inline Data Reduction with Efficient Memory Handling
Mohammadamin Ajdari, Wonsik Lee, Pyeongsu Park, Joonsung Kim, and Jangwoo Kim
IEEE/ACM International Symposium on Microarchitecture (MICRO), Oct. 2019
CIDR: A Cost-Effective In-line Data Reduction System for Terabit-per-Second Scale SSD Arrays
Mohammadamin Ajdari, Pyeongsu Park, Joonsung Kim, Dongup Kwon, and Jangwoo Kim
IEEE International Symposium on High-Performance Computer Architecture (HPCA), Feb. 2019
DCS-ctrl: A Fast and Flexible Device-Control Mechanism for Device-Centric Server Architecture
Dongup Kwon*, Jaehyung Ahn*, Dongju Chae, Mohammadamin Ajdari, Jaewon Lee, Suheon Bae, Youngsok Kim, and Jangwoo Kim
ACM/IEEE International Symposium on Computer Architecture (ISCA), Jun. 2018
DCS: A Fast and Scalable Device-Centric Server Architecture
Jaehyung Ahn*, Dongup Kwon*, Youngsok Kim, Mohammadamin Ajdari, Jaewon Lee, and Jangwoo Kim
ACM/IEEE International Symposium on Microarchitecture (MICRO), Dec. 2015

* Contributed equally