Ankit Bhardwaj and Chinmay Kulkarni, University of Utah; Reto Achermann, University of British Columbia; Irina Calciu, VMware Research; Sanidhya Kashyap, EPFL; Ryan Stutsman, University of Utah; Amy Tai and Gerd Zellweger, VMware Research. Camera-ready submission (all accepted papers): 2 April 2021; Main conference program: 27-28 April 2021; All deadline times are . Session Chairs: Nadav Amit, VMware Research Group, and Ada Gavrilovska, Georgia Institute of Technology, Stephen Ibanez, Alex Mallery, Serhat Arslan, and Theo Jepsen, Stanford University; Muhammad Shahbaz, Purdue University; Changhoon Kim and Nick McKeown, Stanford University. This talk will discuss several examples with very different solutions. P3 exposes a simple API that captures many different classes of GNN architectures for generality. Computation separation makes it possible to construct a deep, bounded-asynchronous pipeline where graph and tensor parallel tasks can fully overlap, effectively hiding the network latency incurred by Lambdas. In addition, increasing CPU core counts further complicate kernel development. HotCRP.com signin Sign in using your HotCRP.com account. Our evaluation on the SPEC benchmarks shows that SanRazor can reduce the overhead of sanitizers significantly, from 73.8% to 28.062.0% for AddressSanitizer, and from 160.1% to 36.6124.4% for UndefinedBehaviorSanitizer (depending on the applied reduction scheme). Editor in charge: Daniel Petrolia . While verifying GoJournal, we found one serious concurrency bug, even though GoJournal has many unit tests. This is especially true for DPF over Rnyi DP, a highly composable form of DP. Furthermore, such performance can be achieved without any modification in applications, network hardware, kernel CPU schedulers and/or kernel network stack. We have made Fluffy publicly available at https://github.com/snuspl/fluffy to contribute to the security of Ethereum. We focus on NVMe storage devices and show that it is natural to express these semantics in the kernel and the application and only requires a modest two-bit change to the device interface. Uniquely, Dorylus can take advantage of serverless computing to increase scalability at a low cost. Authors must limit their responses to (a) correcting factual errors in the reviews or (b) directly addressing questions posed by reviewers. Welcome to the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI '22) submissions site. Authors may submit a response to those reviews until Friday, March 5, 2021. Third, GNNAdvisor capitalizes on the GPU memory hierarchy for acceleration by gracefully coordinating the execution of GNNs according to the characteristics of the GPU memory structure and GNN workloads. Authors may use this for content that may be of interest to some readers but is peripheral to the main technical contributions of the paper. Fortunately, we observe that the backups for high availability in modern distributed OLTP systems can be retrofitted to bridge the analytical queries and transactions in HTAP workloads. This paper demonstrates that it is possible to achieve s-scale latency using Linux kernel storage stack, even when tens of latency-sensitive applications compete for host resources with throughput-bound applications that perform read/write operations at throughput close to hardware capacity. Qing Wang, Youyou Lu, Junru Li, and Jiwu Shu, Tsinghua University. By monitoring the status of each job during training, Pollux models how their goodput (a novel metric we introduce that combines system throughput with statistical efficiency) would change by adding or removing resources. Existing decentralized systems like Steemit, OpenBazaar, and the growing number of blockchain apps provide alternatives to existing services. Therefore, developers typically find data locality issues via dynamic profiling and repair them manually. The main contribution of this paper is GoJournal, a verified, concurrent journaling system that provides atomicity for storage applications, together with Perennial 2.0, a framework for formally specifying and verifying concurrent crash-safe systems. Submitted papers must be no longer than 12 single-spaced 8.5 x 11 pages, including figures and tables, plus as many pages as needed for references, using 10-point type on 12-point (single-spaced) leading, two-column format, Times Roman or a similar font, within a text block 7 wide x 9 deep. PET discovers and applies program transformations that improve computation efficiency but only maintain partial functional equivalence. First, Fluffy mutates and executes multi-transaction test cases to find consensus bugs which cannot be found using existing fuzzers for Ethereum. DMon speeds up PostgreSQL, one of the most popular database systems, by 6.64% on average (up to 17.48%). If your paper is accepted and you need an invitation letter to apply for a visa to attend the conference, please contact conference@usenix.org as soon as possible. Pages should be numbered, and figures and tables should be legible in black and white, without requiring magnification. The key insight in blk-switch is that Linux's multi-queue storage design, along with multi-queue network and storage hardware, makes the storage stack conceptually similar to a network switch. One important reason for the high cost is, as we observe in this paper, that many sanitizer checks are redundant the same safety property is repeatedly checked leading to unnecessarily wasted computing resources. Haojie Wang, Jidong Zhai, Mingyu Gao, Zixuan Ma, Shizhi Tang, and Liyan Zheng, Tsinghua University; Yuanzhi Li, Carnegie Mellon University; Kaiyuan Rong and Yuanyong Chen, Tsinghua University; Zhihao Jia, Carnegie Mellon University and Facebook. The chairs may reject abstracts or papers on the basis of egregious missing or extraneous conflicts. Dorylus is up to 3.8 faster and 10.7 cheaper compared to existing sampling-based systems. Submission of a response is optional. This paper addresses a key missing piece in the current ecosystem of decentralized services and blockchain apps: the lack of decentralized, verifiable, and private search. will work with the steering committee to ensure that the symposium program will accommodate presentations for all accepted papers. OSDI '21 Technical Sessions All the times listed below are in Pacific Daylight Time (PDT). The full program will be available in May 2021. Her robot soccer teams have been RoboCup world champions several times, and the CoBot mobile robots have autonomously navigated for more than 1,000km in university buildings. She developed the technology for making network routing self-stabilizing, largely self-managing, and scalable. However, with the increasingly speedy transactions and queries thanks to large memory and fast interconnect, commodity HTAP systems have to make a tradeoff between data freshness and performance degradation. Our approach outperforms existing file systems on a block SSD by a wide margin 6.2 on average for metadata-intensive benchmarks. We built a functional NFSv3 server, called GoNFS, to use GoJournal. However, a plethora of recent data breaches show that even widely trusted service providers can be compromised. Reviews will be available for response on Wednesday, March 3, 2021. However, existing enclave designs fail to meet the requirements of scalability demanded by new scenarios like serverless computing, mainly due to the limitations in their secure memory protection mechanisms, including static allocation, restricted capacity and high-cost initialization. If you are uncertain about how to anonymize your submission, please contact the program co-chairs, osdi21chairs@usenix.org, well in advance of the submission deadline. Log search and log archiving, despite being critical problems, are mutually exclusive. Research Impact Score 9.24. . In this paper, we propose a software-hardware co-design to support dynamic, fine-grained, large-scale secure memory as well as fast-initialization. Four months after we reported the bugs to Geth developers, one of the bugs was triggered on the mainnet, and caused nodes using a stale version of Geth to hard fork the Ethereum blockchain. An evaluation of Addra on a cluster of 80 machines on AWS demonstrates that it can serve 32K users with a 99-th percentile message latency of 726 msa 7 improvement over a prior system for text messaging in the same threat model. This approach misses possible optimization opportunities as transformations that only preserve equivalence on subsets of the output tensors are excluded. OSDI will provide an opportunity for authors to respond to reviews prior to final consideration of the papers at the program committee meeting. Jiachen Wang, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Shanghai AI Laboratory; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China; Ding Ding, Department of Computer Science, New York University; Huan Wang, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Shanghai AI Laboratory; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China; Conrad Christensen, Department of Computer Science, New York University; Zhaoguo Wang and Haibo Chen, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Shanghai AI Laboratory; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China; Jinyang Li, Department of Computer Science, New York University. Such centralized engines are in a perfect position to censor content and violate users privacy, undermining some of the key tenets behind decentralization. We develop MAGE, an execution engine for SC that efficiently runs SC computations that do not fit in memory. We introduce a hybrid cryptographic protocol for privacy-adhering transformations of encrypted data. Prior or concurrent publication in non-peer-reviewed contexts, like arXiv.org, technical reports, talks, and social media posts, is permitted. The NVMe zoned namespace (ZNS) is emerging as a new storage interface, where the logical address space is divided into fixed-sized zones, and each zone must be written sequentially for flash-memory-friendly access. We discuss the design and implementation of TEMERAIRE including strategies for hugepage-aware memory layouts to maximize hugepage coverage and to minimize fragmentation overheads. Here, we focus on hugepage coverage. If you have any questions about conflicts, please contact the program co-chairs. While several new GNN architectures have been proposed, the scale of real-world graphsin many cases billions of nodes and edgesposes challenges during model training. People often assume that blockchain has Byzantine robustness, so adding it to any system will make that system super robust against any calamity. We implement and evaluate a suite of applications, including MICA, Raft and Set Algebra for document retrieval; and we demonstrate that the nanoPU can be used as a high performance, programmable alternative for one-sided RDMA operations. Currently, for large graphs, CPU servers offer the best performance-per-dollar over GPU servers. (Visa applications can take at least 30 working days to process.) We develop a prototype of Zeph on Apache Kafka to demonstrate that Zeph can perform large-scale privacy transformations with low overhead. We convert five state-of-the-art PM indexes using Nap. The wire-to-wire RPC response time through the nanoPU is just 69ns, an order of magnitude quicker than the best-of-breed, low latency, commercial NICs. Existing algorithms are designed to work well for certain workloads. We argue that a key-value interface between a file system and an SSD is superior to the legacy block interface by presenting KEVIN. First, it enables a caller to push a message to a callee in two hops, using a new way of assigning mailboxes to users that resembles how a post office assigns PO boxes to its customers. Under different configurations of TPC-C and TPC-E, Polyjuice can achieve throughput numbers higher than the best of existing algorithms by 15% to 56%. The overhead of GPT is 5% for memory-intensive workloads (e.g., Redis) and negligible for CPU-intensive workloads (e.g., RV8 and Coremarks). sosp ACM Symposium on Operating Systems Principles. If your accepted paper should not be published prior to the event, please notify production@usenix.org. Evaluations show that Vegito can perform 1.9 million TPC-C NewOrder transactions and 24 TPC-H-equivalent queries per second simultaneously, which retain the excellent performance of specialized OLTP and OLAP counterparts (e.g., DrTM+H and MonetDB). This kernel is scaled across NUMA nodes using node replication, a scheme inspired by state machine replication in distributed systems. This post is for recording some notes from a few OSDI'21 papers that I got fun. Even the little publishable OS work that is not based on Linux still assumes the same simplistic hardware model (essentially a multiprocessor VAX) that bears little resemblance to modern reality. Yuke Wang, Boyuan Feng, Gushu Li, Shuangchen Li, Lei Deng, Yuan Xie, and Yufei Ding, University of California, Santa Barbara. Second, it innovates on the underlying cryptographic machinery and constructs a new private information retrieval scheme, FastPIR, that reduces the time to process oblivious access requests for mailboxes. Mothy received a PhD in 1995 from the Computer Laboratory of the University of Cambridge, where he was a principal designer and builder of the Nemesis OS. Welcome to the 2021 USENIX Annual Technical Conference (ATC '21) submissions site! She is the recipient of several best paper awards, the Einstein Chair of the Chinese Academy of Science, the ACM/SIGART Autonomous Agents Research Award, an NSF Career Award, and the Allen Newell Medal for Excellence in Research. GoJournals goal is to bring the advantages of journaling for code to specs and proofs. After request completion, an I/O device must decide either to minimize latency by immediately firing an interrupt or to optimize for throughput by delaying the interrupt, anticipating that more requests will complete soon and help amortize the interrupt cost. 64 papers accepted out of 341 submitted. Despite having the same end goals as traditional ML, FL executions differ significantly in scale, spanning thousands to millions of participating devices. All papers will be available online to registered attendees before the conference. Although the number of submissions is lower than the past, it's likely only due to the late announcement; being in my first OSDI PC, I think the quality of the submitted and accepted papers remains as high as ever. Attaching supplementary material is optional; if your paper says that you have source code or formal proofs, you need not attach them to convince the PC of their existence. Researchers from the Software Systems Laboratory bagged Best Paper Awards at the 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2021) and the 2021 USENIX Annual Technical Conference (USENIX ATC 2021).. Jay Lepreau Best Paper Award, OSDI'21. Advisor: You have a past or present association as thesis advisor or advisee. Prior or concurrent workshop publication does not preclude publishing a related paper in OSDI. Indeed, it is a prime target for powerful adversaries such as nation states. Our approach effectively eliminates high communication and partitioning overheads, and couples it with a new pipelined push-pull parallelism based execution strategy for fast model training. Finding the inductive invariant of the distributed protocol is a critical step in verifying the correctness of distributed systems, but takes a long time to do even for simple protocols. 23 artifacts received the Artifacts Functional badge (88%). It then feeds those invariants and the desired safety properties to an SMT solver to check if the conjunction of the invariants and the safety properties is inductive. Pollux simultaneously considers both aspects. MAGE outperforms the OS virtual memory system by up to an order of magnitude, and in many cases, runs SC computations that do not fit in memory at nearly the same speed as if the underlying machines had unbounded physical memory to fit the entire computation. You must not improperly identify a PC member as a conflict if none of these three circumstances applies, even if for some other reason you want to avoid them reviewing your paper. We present the nanoPU, a new NIC-CPU co-design to accelerate an increasingly pervasive class of datacenter applications: those that utilize many small Remote Procedure Calls (RPCs) with very short (s-scale) processing times. Our further evaluation on 38 CVEs from 10 commonly-used programs shows that SanRazor reduced checks suffice to detect at least 33 out of the 38 CVEs. Today, privacy controls are enforced by data curators with full access to data in the clear. Submissions may include as many additional pages as needed for references but not for appendices. These scripts often make pages slow to load, partly due to a fundamental inefficiency in how browsers process JavaScript content: browsers make it easy for web developers to reason about page state by serially executing all scripts on any frame in a page, but as a result, fail to leverage the multiple CPU cores that are readily available even on low-end phones. Storm ensures security using a Security Typed ORM that refines the (type) abstractions of each layer of the MVC API with logical assertions that describe the data produced and consumed by the underlying operation and the users allowed access to that data. In this paper, we present P3, a system that focuses on scaling GNN model training to large real-world graphs in a distributed setting. For instance, the following are not sufficient grounds to specify a conflict with a PC member: they have reviewed the work before, they are employed by your competitor, they are your personal friend, they were your post-doc advisor or advisee, or they had the same advisor as you. Password Jaehyun Hwang and Midhul Vuppalapati, Cornell University; Simon Peter, UT Austin; Rachit Agarwal, Cornell University. NrOS replicates kernel state on each NUMA node and uses operation logs to maintain strong consistency between replicas. This fast path contains programmable hardware support for low latency transport and congestion control as well as hardware support for efficient load balancing of RPCs to cores. Manuela will present examples and discuss the scope of AI in her research in the finance domain. Questions? At a high level, Addra follows a template in which callers and callees deposit and retrieve messages from private mailboxes hosted at an untrusted server. OSDI brings together professionals from academic and industrial backgrounds in a premier forum for discussing the design, implementation, and implications of systems software. Poor data locality hurts an application's performance. Pollux improves scheduling performance in deep learning (DL) clusters by adaptively co-optimizing inter-dependent factors both at the per-job level and at the cluster-wide level. The paper review process is double-blind. For example, talks may be shorter than in prior years, or some parts of the conference may be multi-tracked. The ZNS+ also allows each zone to be overwritten with sparse sequential write requests, which enables the LFS to use threaded logging-based block reclamation instead of segment compaction. Jason Mohoney and Roger Waleffe, University of WisconsinMadison; Henry Xu, University of Maryland, College Park; Theodoros Rekatsinas and Shivaram Venkataraman, University of WisconsinMadison. We observe that, due to their intended security guarantees, SC schemes are inherently oblivioustheir memory access patterns are independent of the input data. We compare Marius against two state-of-the-art industrial systems on a diverse array of benchmarks. Because DistAI starts with the strongest possible invariants, if the SMT solver fails, DistAI does not need to discard failed invariants, but knows to monotonically weaken them and try again with the solver, repeating the process until it eventually succeeds. Our evaluation shows that DistAI successfully verifies 13 common distributed protocols automatically and outperforms alternative methods both in the number of protocols it verifies and the speed at which it does so, in some cases by more than two orders of magnitude. Mingyu Li, Jinhao Zhu, and Tianxu Zhang, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Shanghai AI Laboratory; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China; Cheng Tan, Northeastern University; Yubin Xia, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Shanghai AI Laboratory; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China; Sebastian Angel, University of Pennsylvania; Haibo Chen, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Shanghai AI Laboratory; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China. And yet, they continue to rely on centralized search engines and indexers to help users access the content they seek and navigate the apps. Her specialties include network routing protocols and network security. We first introduce two new hardware primitives: 1) Guarded Page Table (GPT), which protects page table pages to support page-level secure memory isolation; 2) Mountable Merkle Tree (MMT), which supports scalable integrity protection for secure memory. We also propose two file system techniques for ZNS+-aware LFS. For any further information, please contact the PC chairs: pc-chairs-2022@eurosys.org. Professor Veloso is the Past President of AAAI (the Association for the Advancement of Artificial Intelligence), and the co-founder, Trustee, and Past President of RoboCup.
Alexander County Schools Calendar,
How Many Physical Bitcoins Are There,
Elena Mukhina Injury Video,
How Does A Blizzard Affect The Hydrosphere,
Articles O
osdi 2021 accepted papers