
At the same time, TQP can support various hardware while only requiring a fraction of the usual development effort. TQP is able to run the full TPC-H benchmark by implementing novel algorithms for relational operators on the tensor routines. We design, build, and evaluate Tensor Query Processor (TQP): TQP transforms SQL queries into tensor programs and executes them on TCRs. In this paper, we explore how database management systems can ride the wave of innovation happening in the AI space. By hiding the low-level complexity through a tensor-based interface, tensor computation runtimes (TCRs) such as PyTorch allow data scientists to efficiently exploit the exciting capabilities offered by the new hardware. This leads to an explosion in the number of specialized hardware devices, which are now offered by major cloud vendors. The huge demand for computation in artificial intelligence (AI) is driving unparalleled investments in hardware and software systems for AI. Query Processing on Tensor Computation Runtimes ĭong He (University of Washington)*, Supun C Nakandala (University of California, San Diego), Dalitso Banda (Microsoft), Rathijit Sen (Microsoft), Karla Saur (Microsoft), Kwanghyun Park (Microsoft), Carlo Curino (Microsoft - GSL), Jesús Camacho-rodrÃguez (Microsoft), Konstantinos Karanasos (Meta), Matteo Interlandi (Microsoft) Lastly, we will share our lessons and best practices in developing and running ByteHTAP in production. In addition, we introduce some important performance optimizations to ByteHTAP, such as pushing computations to the storage layer and using delete bitmaps to efficiently handle deletes. ByteHTAP also provides strong data consistency through global timestamps across its OLTP and OLAP system, which greatly relieves application developers from handling complex data consistency issues by themselves. Customers can also configure different data freshness thresholds based on their business needs. This choice saves us a lot of resources and development time, and allows easy future extensions such as replacing the query processing engine with other alternatives.īyteHTAP can provide high data freshness with less than one second delay, which enables many new business opportunities for our customers. Its modular system design fully utilizes an existing ByteDanceâs OLTP system and an open source OLAP system. It adopts a separate-engine and shared-storage architecture.

In this paper, we describe our journey of building ByteHTAP, an HTAP system with high data freshness and strong data consistency. In recent years, at ByteDance, we see more and more business scenarios that require performing complex analysis over freshly imported data, together with transaction support and strong data consistency. Jianjun Chen (Bytedance)*, Yonghua Ding (), Ye Liu (Bytedance Inc.), Fangshi Li (Bytedance), Li Zhang (ByteDance), Mingyi Zhang (ByteDance Inc), Kui Wei (ByteDance Inc.), Cao Lixun (ByteDance), Dan Zou (ByteDance), Yang Liu (ByteDance), Lei Zhang (ByteDance), Rui Shi (ByteDance Inc.), Wei Ding (Bytedance), KAI WU (ByteDance), Shangyu Luo (ByteDance), Jason Sun (Bytedance ), Yuming Liang (ByteDance Inc.) We also evaluate SNARF in RocksDB as a filter replacement for filtering requests before they access on-disk data structures.įor RocksDB, SNARF can improve the execution time of the system up to 10x compared to SuRF and Rosetta for certain read-only workloads.īyteHTAP: ByteDance's HTAP System with High Data Freshness and Strong Data Consistency We evaluate SNARF on multiple synthetic and real-world datasets as a stand-alone filter and by integrating it into RocksDB.įor range queries, SNARF provides up to 50x better false positive rate than state-of-the-art range filters, such as SuRF and Rosetta, with the same space usage. The model along with the compressed bit array which constitutes SNARF are used to answer membership queries.



SNARF creates a model of the data distribution to map the keys into a bit array which is stored in a compressed form. We present Sparse Numerical Array-Based Range Filters (SNARF), a learned range filter that efficiently supports range queries for numerical data. Kapil Vaidya (MIT)*, Tim Kraska (MIT), Subarna Chatterjee (Harvard University ), Eric R Knorr (Harvard), Michael Mitzenmacher (Harvard), Stratos Idreos (Harvard) Database Engines 1 Chaired by Mohammad Dashti (MongoDB)
